Main Content

We continue exploring Machine Learning on the giant new tiny device of Seeed XIAO family, the ESP32S3 Sense.

Welcome back to our ongoing series on Tiny Machine Learning (TinyML)! Having already delved into Image Classification, Motion Classification, and Anomaly Detection in our two previous tutorials, we’re now shifting our focus to the realm of voice-activated applications with a project on Keyword Spotting (KWS) using the XIAO ESP32S3 board.

Keyword Spotting (KWS) is integral to many voice recognition systems, enabling devices to respond to specific words or phrases. While this technology underpins popular devices like Google Assistant or Amazon Alexa, it’s equally applicable and achievable on smaller, low-power devices. This tutorial will guide you through implementing a KWS system using TinyML on the XIAO ESP32S3 microcontroller board.

As we learned, the XIAO ESP32S3, equipped with Espressif’s ESP32-S3 chip, is a compact and potent microcontroller offering a dual-core Xtensa LX7 processor, integrated Wi-Fi, and Bluetooth. Its balance of computational power, energy efficiency, and versatile connectivity make it a fantastic platform for TinyML applications. Also, with its expansion board, we will have access to the “sense” part of the device, which has a 1600x1200 OV2640 camera, an SD card slot, and a digital microphone. The integrated microphone and the SD card will be essential in this project.

As in previous series tutorials, we will utilize the Edge Impulse Studio, a powerful, user-friendly platform that simplifies creating and deploying machine learning models onto edge devices. We’ll take step-by-step training of a KWS model, optimizing and deploying it onto the XIAO ESP32S3 Sense.

Our model will be designed to recognize keywords that can trigger device wake-up or specific actions (in the case of “YES”), bringing your projects to life with voice-activated commands.

Leveraging our experience with TensorFlow Lite for Microcontrollers (the engine “under the hood” on the EI Studio) from previous tutorials, we’ll create a KWS system capable of real-time machine learning on the device.

As we progress through the tutorial, we’ll break down each process stage - from data collection and preparation to model training and deployment - to provide a comprehensive understanding of implementing a KWS system on a microcontroller.

So, let’s continue our journey into the exciting world of TinyML with Keyword Spotting on the XIAO ESP32S3 using Edge Impulse Studio!

How does a voice assistant work?
The introduction explained that Keyword Spotting (KWS) is critical to many voice assistants, enabling devices to respond to specific words or phrases.

To start, it is essential to realize that Voice Assistants on the market, like Google Home or Amazon Echo-Dot, only react to humans when they are “waked up” by particular keywords such as “ Hey Google” on the first one and “Alexa” on the second.”

Link to article