Speech to Text in Python

Yash Maheshwari
Jun 16, 2020
3 min read

Updated: Apr 11, 2022

I built an application based on AI to ask the user to state a question and will listen to the Answer that the user replies. The application understands what the user said and prints what they said on the screen

How to convert speech into text in Python, Python-based program

What does this program do?

This program will ask the user to state a question and will listen to the Answer that the user replies. Then it will understand what the user said and print what they said.

Steps to Understanding Speech to Text

Some people think that it is very difficult and time consuming to convert speech to text. However, truth be told, converting speech to text is simple and a straight forward process.

On a high level, below are the list of steps to convert speech to text:

i. Ask the user a question

ii. Give them a turn to talk and record the sounds

iii. Use a function to make out what they are saying

iv. print what we think they are saying

Step #1: Install Packages

These instructions are meant for python 3.8, if you have an older version you can upgrade or try to follow along, but the code won’t be exactly the same.

Before you start writing python codes, let’s install some packages.

Go to the terminal and type “pip install” with the package name. An example is, “pip install pyaudio” and press “enter” key.

Similarly, install all the packages listed in the table below, in different lines in the terminal:

If any of these packages have already been installed on your computer, you will get the below message “Requirement already satisfied”:

If the package installation works, it will look like something like the below image “Successfully installed”:

If there is an error, there will be a red message like the below image “ERROR: No matching distribution”. Fix the error using the correct package name and successfully install all these packages.

Once all the above packages are successfully installed, it’s time to move on to importing the packages in the program and start coding.

Step #2: Import Packages

Import the below packages in the python program to be able to use the packages. What these packages do is labeled above.

Step #3: Add the Recognizer

In this step you will add a recognizer. A recognizer is used to listen and to recognize what they listen to.

Step #4: Add a forever loop

In this step you will add a loop that goes forever.

Step #5: Listen to what the user says

In this step you will use a microphone to listen to what the user is saying. How it works is you save the microphone in a variable called source and print “Please state your question”. Then the user starts talking and the microphone picks up the sounds that the user is making. While that is happening the recognizer is storing the sounds that the microphone is giving.

Step #6: Understanding what the user said

In this step you will have a try and except block if there is an error. Inside the try you will use recognize_google to understand what the user was saying and print what we think the user is saying. If there is an error you will set what the user is saying to error.

Below is the complete source code

Thanks for staying till the end. If you enjoyed this article, please follow me for updates on new articles. Medium is where I want to give back to the community and help others learn coding. I welcome responses related to this article and its views. Please leave a response on what else you would like to learn and I’ll try to teach that at some point.