Main Content

TinyML under the hood: Spectral Analysis

Introduction
I have written several tutorials on Embedded Machine Learning projects in recent years. Among them, I explored a few devices capturing data from accelerometers such as Sensor DataLogger, TinyML Made Easy: Anomaly Detection & Motion Classification, TinyML Made Easy: Gesture Recognition, and TinyML - Motion Recognition Using Raspberry Pi Pico.

What did all those projects have in common? Data come from accelerometers and Edge Impulse Studio, a leading development platform for edge-device machine learning. With the Studio, collecting time-series datasets and preprocessing them for input features on a Machine Learning model training was possible.

Data preprocessing is a challenging area for embedded machine learning. Still, Edge Impulse helps overcome this with its digital signal processing (DSP) preprocessing step and, more specifically, the Spectral Features Block for accelerometers.

But how it works under the hood? Let’s dig into it.

Extracting Features Review
Extracting features from a dataset captured with accelerometers involves processing and analyzing the raw data. Accelerometers measure the acceleration of an object along one or more axes (typically three, denoted as X, Y, and Z). These measurements can be used to understand various aspects of the object’s motion, such as movement patterns and vibrations. Here’s a high-level overview of the process:

Data collection: First, we need to gather data from the accelerometers. Depending on the application, data may be collected at different sampling rates. It’s essential to ensure that the sampling rate is high enough to capture the relevant dynamics of the studied motion (The sampling rate should be at least double the maximum relevant frequency present in the signal).

Data preprocessing: Raw accelerometer data can be noisy and contain errors or irrelevant information. Preprocessing steps, such as filtering and normalization, can help clean and standardize the data, making it more suitable for feature extraction. The Studio does not perform standardization, so sometimes when working with Sensor Fusion could be necessary to perform this step before uploading data to the Studio. See the great Shawn Hymel’s tutorial Data Curation and Feature Scaling with Edge Impulse to learn more about it.

Segmentation: Depending on the nature of the data and the application, dividing the data into smaller segments or windows may be necessary. This can help focus on specific events or activities within the dataset, making feature extraction more manageable and meaningful. The window size and overlap (window increase) choice depend on the application and the frequency of the events of interest. As a thumb rule, we should try to capture a couple of “cycles of data”.

Feature extraction: Once the data is preprocessed and segmented, you can extract features that describe the motion’s characteristics. Some typical features extracted from accelerometer data include:

Time-domain features describe the data’s statistical properties within each segment, such as mean, median, standard deviation, skewness, kurtosis, and zero-crossing rate.
Frequency-domain features are obtained by transforming the data into the frequency domain using techniques like the Fast Fourier Transform (FFT). Some typical frequency-domain features include the power spectrum, spectral energy, dominant frequencies (amplitude and frequency), and spectral entropy.
Time-frequency domain features combine the time and frequency domain information, such as the Short-Time Fourier Transform (STFT) or the Discrete Wavelet Transform (DWT). They can provide a more detailed understanding of how the signal’s frequency content changes over time.
In many cases, the number of extracted features can be quite large, which may lead to overfitting or increased computational complexity. Feature selection techniques, such as mutual information, correlation-based methods, or principal component analysis (PCA), can help identify the most relevant features for a given application and reduce the dimensionality of the dataset. The Studio can help with such feature importance calculations.

Let’s explore one of my TinyML Motion Classification projects in more detail.”

Link to article