In this tutorial, we will analyze the creation of a voice bot using neural network technology in Python. The bot can recognize a human voice in real time from your device, such as a laptop microphone, and speak out conscious responses that are processed by the neural network.
The bot consists of two main parts: this is the dictionary processing part and the voice assistant part.
You can conduct all the development of writing a bot in the PyCharm IDE, you can download it from the JetBrains official website.
All required libraries can be installed using PyPI right in the PyCharm console. You can find the installation commands on the official website in the section of the required library.
Dataset
A data set is a set of data for analysis. In our case, it will be a text file containing lines in the form of a question / answer.
All lines of text are iterated over using the for function, while all unnecessary characters are removed from the text by the mask found in the alphabet variable. Each string value is separately entered into the dataset array.
After processing the text, all its values are converted into vectors using the Scikit-learn machine learning library. This example uses the CountVectorizer () function. Further, all vectors are assigned a class using the LogisticRegression () classifier.
When a message comes from the user, it is also converted into a vector, and then the neural network tries to find a similar vector in the dataset corresponding to a certain question, when the vector is found, we will receive an answer.
Voice Assistant
For voice recognition and sounding the bot’s responses, the SpeechRecognition library is used. The system waits in an endless loop when a question comes, in our case a voice from a microphone, after which it converts it into text and sends it for processing to the neural network. After receiving a text response, it is converted into speech, the recording is saved in the project folder and deleted after playback. It’s that simple! For convenience, all messages are duplicated in text to the console.
With the default settings, the response time was quite long, sometimes it was necessary to wait for 15-30 seconds. In addition, the question was received from the slightest noise. The following settings helped:
voice_recognizer.dynamic_energy_threshold = False
voice_recognizer.energy_threshold = 1000
voice_recognizer.pause_threshold = 0.5
And timeout = None, phrase_time_limit = 2 in the listen () function
After that, the bot began to respond with minimal delay.
Other values may work for you. You can see the description of these and other settings on the same PyPI website in the SpeechRecognition library section. But for some reason I didn’t find the phrase_time_limit setting there, I stumbled upon it by accident on Stack Overflow.
Data set text
This is a small sample of text. Of course, there should be much more questions and answers.
Hi Hi
how are you \ everything is fine
how are you \ thanks great
who are you \ I am a bot
what are you doing / talking to you
Python code
import speech_recognition as sr
from gtts import gTTS
import playsound
import os
import random
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.linear_model import LogisticRegression
# Vocabulary
def clean_str(r):
r = r.lower()
r = [c for c in r if c in alphabet]
return ''.join(r)
alphabet = ' 777abc'
with open('dialogues.txt', encoding='utf-8') as f:
content = f.read()
blocks = content.split('\n')
dataset = []
for block in blocks:
replicas = block.split('\\')[:2]
if len(replicas) == 2:
pair = [clean_str(replicas[0]), clean_str(replicas[1])]
if pair[0] and pair[1]:
dataset.append(pair)
X_text = []
y = []
for question, answer in dataset[:10000]:
X_text.append(question)
y += [answer]
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(X_text)
clf = LogisticRegression()
clf.fit(X, y)
def get_generative_replica(text):
text_vector = vectorizer.transform([text]).toarray()[0]
question = clf.predict([text_vector])[0]
return question
# Voice Assistant
def listen():
voice_recognizer = sr.Recognizer()
voice_recognizer.dynamic_energy_threshold = False
voice_recognizer.energy_threshold = 1000
voice_recognizer.pause_threshold = 0.5
with sr.Microphone() as source:
print("speak 🎤")
audio = voice_recognizer.listen(source, timeout = None, phrase_time_limit = 2)
try:
voice_text = voice_recognizer.recognize_google(audio, language="en")
print(f"You said: {voice_text}")
return voice_text
except sr.UnknownValueError:
return "Recognition error"
except sr.RequestError:
return "Connection error"
def say(text):
voice = gTTS(text, lang="ru")
unique_file = "audio_" + str(random.randint(0, 10000)) + ".mp3"
voice.save(unique_file)
playsound.playsound(unique_file)
os.remove(unique_file)
print(f"Бот: {text}")
def handle_command(command):
command = command.lower()
reply = get_generative_replica(command)
say(reply)
def stop():
say("Пока")
def start():
print(f"Bot launch...")
while True:
command = listen()
handle_command(command)
try:
start()
except KeyboardInterrupt:
stop()