“A voice-enabled machine which reads book and replies to the questions.
In this project I built a voice-enabled teachable machine which can scan text from a book pages or any text source and convert that to a context and users can ask questions related to that context and the machine can answer just using the context. I always wanted to make such kind of edge device which is easy to deploy and can be trained for a given context effortlessly without needing any internet connection.
Machine Learning Models used in the Application
Three machine learning models are used:
1. Tesseract OCR (LSTM based model)
Tesseract is an OCR engine with support for unicode and the ability to recognize more than 100 languages out of the box. It can be trained to recognize other languages.
2. DeepSpeech (TensorFlow Lite model)
DeepSpeech is an open source Speech-To-Text engine, using a model trained by machine learning techniques which Google’s TensorFlow to make the implementation easier.
BERT is a language representation model which stands for Bidirectional Encoder Representations from Transformers. The pre-trained BERT model can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial task-specific architecture modifications.
The first 2 models run at Raspberry Pi 4 and the last one runs at the Intel Neural Compute Stick 2 using the OpenVINO Toolkit.”