.
New Article: ESP32 online Web Remote Watch: Video

Voice controlled robot using Raspberry Pi | Speech Recognition Javascript API

  share with someone if you find it useful

This article describes making of a voice controlled robot using Raspberry Pi. The project involves creation of a Web Application that resides in a webserver running on Raspberry Pi. Try this live demo of the web application to understand how it works. 

GUI of the application is written in HTML & Javascript and backend processing (including hardware controls) is achieved through PHP.

voice controlled raspberry pi robot, speech recognition using javascript

The voice commands are converted to text using a Javascript API called "Web Speech API" running on a browser. The converted text is sent to the server (running on robot) for further processing to achieve robotic controls.

Any spoken word / phrase (Voice Commands) are converted to text and passed to the robot. However, valid voice commands require the phrase to begin with the word 'robot' followed by the action that needs to be performed. A list of valid commands is given below:-

- robot forward

- robot backward

- robot stop

- robot lights on

- robot lights off

- robot right

- robot left

 

The article covers following aspects:-

  • What is Web Speech API ?
  • Web Application Components
  • Code walk through

 Let's build it.

 

What is Web Speech API ?

 

Web Speech API is a Javascript API which allows you to do following tasks through a web browser:-

1. Speech Recognition (Speech to Text)

2. Speech Synthesis (Text to Speech)

A complete guide to use this API is available at this link

https://developer.mozilla.org/en-US/docs/Web/API/Web_Speech_API

 

The example demo used in the official website of Web Speech API can be accessed though this link:-

https://mdn.github.io/web-speech-api/speech-color-changer/

 

Also try this example:-

https://www.google.com/intl/en/chrome/demos/speech.html


An overview of the API is show below.

Voice recognition using web speech API

 

The part of the API which is relevant to this project is highlighted in yellow color. As shown in the picture above, the Speech Recognition module of the API has multiple interfaces. The 'SpeechRecognition' interface is implemented to make voice controlled robot. Rest of them are not used.

The 'SpeechRecognition' interface of the API is responsible for the voice recognition service and converts the voice commands to readable text. This interface has 03 methods and multiple properties (with associated events). Some of these methods and properties are used in creating the application for voice controlled robot.

 

Web Application Components for Voice Controlled robot

 

The application performs the task of capturing voice input from the user, converting the voice to text, sending the text to server and using the information in text to control the robot. These tasks are implemented in following files:-

Task File Name
Creation of GUI index.html
Voice recognition, converting speech to text, sending the converted text to server script.js
Processing the text to identify valid commands, controlling the GPIO pins of Raspberry Pi to actuate robotic motion action.php

 

The various sections of the GUI are annotated in the picture below.

voice controlled robot using raspberry pi

 Following Javascript files are embedded in GUI.

 <script src="jquery.min.js"></script>
 <script src="script.js"></script>

'jquery.min.js' is used for implementing ajax functionality to send to converted text to server. In above example, the converted text i.e. "robot forward" is transmitted to the server.

You can test the functionality using the live demo using the link provided in 'Project Description' section above.

 

Code Walkthrough

 

script.js

When the GUI is launched, javascript files associated with it comes into action.

The javascript file 'script.js' begins with initialising the 'SpeechRecognition' object and other variables. In case the browser doesn't support Web Speech API, it displays an error message.

To allow longer silence periods while giving voice input, the 'continuous' property is set to true.

try {
  var SpeechRecognition = window.SpeechRecognition || window.webkitSpeechRecognition;
  var recognition = new SpeechRecognition();
  $('.no-browser-support').hide();
}
catch(e) {
  console.error(e);
  $('.no-browser-support').show();
}

recognition.continuous = true;

After initialising the variables, the script wait for an event to happen.

On the press of 'Start Voice Recognition' button, following code is executed

$('#start-record-btn').on('click', function(e) {
  console.log("start recog");
 
  recognition.start();
  btn_start.css("background-color", "green");
});

above code calls the start() method which starts the speech recognition service in background.

Once the service has started, the 'onstart' event handler will execute. It is used for displaying a text message about the activation of the service.

recognition.onstart = function() { 
  instructions.text('Voice recognition activated');
}

At this stage, the service is listening to incoming audio (voice commands). Any spoken word/phrase will be captured through the laptop's or mobile phone's microphone. An event handler 'onresult' will execute when the speech recognition service returns a result (a word or phrase) corresponding to the audio.

This event handler is used for displaying the result (text corresponding to speech) and passing the result to server

recognition.onresult = function(event) {
  // event is a SpeechRecognitionEvent object. It holds all the lines we have captured so far. We only need the current one.
  var current = event.resultIndex;

  // Get a transcript of voice.
  var transcript = event.results[current][0].transcript;
  
  //display the transcript
   noteTextarea.html(transcript);
   
  //invoke action()
   action(transcript);
};

The function 'action(text)' sends the transcript to a PHP file (action.php) in the server as post parameter via background ajax call. Further, it updates the element id 'response' with the data received from the server.

function action(text){
     console.log("text:" + text);
     $.post("action.php", 
                  {txt: text},
                function(data,status){
                document.getElementById("response").innerHTML = data; 
               }
     );
}

 

action.php

The file 'action.php' is responsible for the hardware control of the voice controlled robot.

The transcript sent by the 'action()' function of javascript file 'script.js' is received by this PHP file using a super global variable $_POST.  The text is then trimmed to remove any white spaces and converted to lowercase.

$text=$_POST["txt"];

$text=trim($text);
$text=strtolower($text);

A valid command has minimum 02 words and maximum 03 words. The script calculates the number of words in the command using this code

$word=explode(" ", $text);
$sz=sizeof($word);

If the number of words are in valid range and the first word is 'robot', the script goes on to check the second word. If the second word is one of the pre-defined keywords, the corresponding function to control the hardware is executed.

if($word[0] == "robot" and $sz>1 and $sz<=3){

	if($word[1] == "forward"){
		forward(); //move the robot in forward direction
		echo"[$time]: moving forward<br>"; //this text is sent back to the client as an acknowledgement
	}
	elseif($word[1] == "backward"){
		back();//move the robot in forward direction
		echo"[$time]: moving backwards<br>"; //this text is sent back to the client as an acknowledgement
	}
	.
        .
        .
        .
}
else{
	echo"Invalid command<br>";
}

The functions forward(), back(), left(), right() etc control the GPIO pins of the Raspberry Pi. These functions are defined in a file called 'vars.php'. This file is included in the begining of 'action.php' file.

Complete source code of the voice controlled robot project can be downloaded through this Github link.


Comments

  • from Anil K Chikkam , 1 year ago

    Hi!

    Very nice project. As a further enhancement, can we build the entire concept in RaspPi itself, without the need of a Laptop/PC/Mobile. I mean using a USB microphone, etc. To operate the Robot in a room.

    Best regards,

    Anil K Chikkam.

  • from Rushi , 3 years ago
    How much range this robot can control
    • from Spark , 3 years ago

      the range depends upon your wifi network range.

  • from Jairo Josy , 3 years ago
    connections from motor driver to raspberry pi
    • from Spark , 3 years ago

      connection details are given here:-

      https://helloworld.co.in/article/basic-robotics-make-robot-raspberry-pi-web-controls

  • from Imed , 3 years ago
    Your Browser Doesn't Support the Web Speech API ????
    • from Spark , 3 years ago

      Try chrome. Also, install SSL certificates on Raspberry Pi webserver.

      OR.. use these instructions to get all dependencies installed in automated way

      https://helloworld.co.in/earthrover/

Leave a comment