• Home
  • blog
  • Make a Voice-Controlled Audio Player with the Web Speech API

Make a Voice-Controlled Audio Player with the Web Speech API

The Web Speech API is a JavaScript API that enables web developers to incorporate speech recognition and synthesis into their web pages.

There are many reasons to do this. For example, to enhance the experience of people with disabilities (particularly users with sight problems, or users with limited ability to move their hands), or to allows users to interact with a web app while performing a different task (such as driving).

If you have never heard of the Web Speech API, or you would like a quick primer, then it might be a good idea to read Aurelio De Rosa’s articles Introducing the Web Speech API, Speech Sythesis API and the Talking Form .

Browser Support


Browsers vendors have only recently started implementing both the Speech Recognition API and the Speech Synthesis API. As you can see, support for these is still far from perfect, so if you are following along with this tutorial, please use an appropriate browser.

In addition, the speech recognition API currently requires an Internet connection, as the speech gets passed through the wire and the results are returned to the browser. If the connection uses HTTP, the user has to permit a site to use their microphone on every request. If the connection uses HTTPS, then this is only necessary once.

Speech Recognition Libraries

Libraries can help us manage complexity and can ensure we stay forward compatible. For example when another browser starts supporting the Speech Recognition API, we would not have to worry about adding vendor prefixes.

One such library is Annyang, which is incredibly easy to work with. Tell me more.

To initialize Annyang, we add their script to our website:

<script src="//"></script>

We can check if the API is supported like so:

if (annyang) { /*logic */ }

And add commands using an object with the command names as keys and the callbacks as methods. :

var commands = {
  'show divs': function() {
  'show forms': function() {

Finally, we just add them and start the speech recognition using:


Voice-controlled Audio Player

In this article, we will be building a voice-controlled audio player. We will be using both the Speech Synthesis API (to inform users which song is beginning, or that a command was not recognized) and the Speech Recognition API (to convert voice commands to strings which will trigger different app logic).

The great thing about an audio player that uses the Web Speech API is that users will be able to surf to other pages in their browser or minimize the browser and do something else while still being able to switch between songs. If we have a lot of songs in the playlist, we could even request a particular song without searching for it manually (if we know its name or singer, of course).

We will not be relying on a third-party library for the speech recognition as we want to show how to work with the API without adding extra dependencies in our projects. The voice-controlled audio player will only be supporting browsers that support the interimResults attribute. The latest version of Chrome should be a safe bet.

As ever, you can find the complete code on GitHub, and a demo on CodePen.

Getting Started — a Playlist

Let’s start with a static playlist. It consists of an object with different songs in an array. Each song is a new object containing the path to the file, the singer’s name and the name of the song:

var data = {
  "songs": [
      "fileName": "",
      "singer" : "Jason Shaw",
      "songName" : "Running Waters"

We should be able to add a new objects to the songs array and have the new song automatically included into our audio player.

The Audio Player

Now we come to the player itself. This will be an object containing the following things:

  • some setup data
  • methods pertaining to the UI (e.g. populating the list of songs)
  • methods pertaining to the Speech API (e.g. recognizing and processing commands)
  • methods pertaining to the manipulation of audio (e.g. play, pause, stop, prev, next)

Setup Data

This is relatively straight forward.

var audioPlayer = {
  audioData: {
    currentSong: -1,
    songs: []

The currentSong property refers to the index of the song that the user is currently on. This is useful, for example, when we have to play the next/previous song, or stop/pause the song.

The songs array contains all the songs that the user has listened to. This means that the next time the user listens to the same song, we can load it from the array and not have to download it.

You can see the full code here.

Continue reading %Make a Voice-Controlled Audio Player with the Web Speech API%