ML for Biosignals: Features, PCA & Classifiers
Prince E. Adjei
Kwame Nkrumah University of Science and Technology
Topic: ML for Biosignals Module 4: Advanced Topics
Biosignal Processes And Analysis (BME 366)
© 2025 Prince E. Adjei
Topics
(1). Feature Extraction from Biosignals
(2). Principal Component Analysis (PCA)
(3). k-Nearest Neighbors (k-NN)
(4). Support Vector Machines (SVM)
(5). Classifier Evaluation
ML for Biosignals
Learning Objectives
Explain the steps in a typical machine learning (ML) pipeline for
biosignal processing.
Extract relevant features (e.g., RMS, MF, entropy) from biosignal data
and structure them into feature vectors.
Apply Principal Component Analysis (PCA) for dimensionality
reduction and noise suppression in multi-channel signals.
Compare and evaluate classifiers such as k-NN and SVM in terms of
decision boundaries, accuracy, and generalization.
Interpret model evaluation metrics, including the confusion matrix,
precision, recall, and F1 score.
What is a Feature Vector?
A feature vector is a list of numerical features (descriptors) extracted
from a signal segment that summarize its properties in a compact
form.
Common Features in Biosignal Processing:
Feature Vector
Why Use a Feature Vector?
Reduces large signals to compact, interpretable numbers
Enables machine learning, classification, or pattern recognition
Makes analysis scalable across time and subjects
N ×d Table Explained:
N = Number of signal segments (e.g., per 5s window)
d = Number of features per segment (e.g., 4 features → RMS, MF, HRV,
Entropy)
Feature Vector
N×d Table Explained:
N = Number of signal segments (e.g., per 5s window)
d = Number of features per segment (e.g., 4 features RMS, MF, HRV,
Entropy)
Feature Vector
What is Scaling?
Why? PCA is sensitive to scale , features with large ranges dominate
variance.
Common Methods:
Standardization:
Min-Max Scaling: Rescales to [0,1]
Scaling & PCA
PCA is a technique that:
Reduces the number of features in your data
Keeps the features that carry the most important information
Helps you remove noise and visualize data in 2D or 3D
How it works:
Compute the covariance matrix of scaled data.
Find eigenvectors (directions) and eigenvalues (variance captured).
Project data onto the top k components.
What is PCA (Principal Component Analysis)?
Variance capture tells you how much of the original data's information is
kept when you reduce dimensions using PCA.
The first principal component captures the most variance (i.e., the most
important pattern in your data).
The second component captures the next most, and so on.
Total Variance Explained = sum of eigenvalues.
Screen Plot shows how many components capture how much variance.
Retain enough PCs to capture ~9095% variance.
“High variance signal. Low variance likely noise.
Variance Capture
Variance Capture
Noise = small variance, scattered directions.
PCA filters noise by:
Discarding components with low eigenvalues.
Keeping only directions with structured variation.
Analogy:
PCA is like tuning a radio remove static (noise) and amplify the music
(signal).
Noise Suppression
Multi-channel EMG signals have correlated and redundant channels,
so we use PCA to reduce dimensionality and suppress noise while
keeping core signal features.
PCA : Multi-Channel EMG → PCA → Reconstruction
Multi-Channel EMG Data:
Example: 8 channels from a forearm muscle array.
Raw EMG contains redundancy, artifacts, and noise.
Apply PCA:
Covariance analysis: Finds principal components (PCs) across channels.
Projection: EMG signals transformed to PC space.
Variance ranked: The First few PCs capture most muscular activity.
Pipeline Overview
Reconstruction:
Reconstruct EMG using only top kk PCs (e.g., top 3).
Inverse transform to channel space:
Pipeline Overview
Reduces noise, keeps essential signal features.
Key Benefits:
Pipeline Overview
PCA on Simulated Multi-channel EMG
k-Nearest Neighbors (k-NN)
What is k-NN?
A non-parametric, instance-based learning algorithm.
Classifies a new point based on the majority class among its k
nearest neighbors in the training set.
How it Works:
Choose k (number of neighbors).
Compute distances from the test point to all training points.
Select the k nearest points.
Assign the majority label among them.
Decision Boundaries
k-NN creates non-linear, flexible decision boundaries.
Small k: captures fine details but may overfit (noisy boundary).
Large k: smooths boundary, better generalization, but may underfit.
k-Nearest Neighbors (k-NN)
Most common: Euclidean Distance
Others:
Manhattan Distance:
Cosine Distance, Minkowski, etc.
Choice of distance affects results important in high-dimensional or non-
numeric data.
Distance Metric (How 'near' is defined)
k-Nearest Neighbors (k-NN)
Questions
1.What do the rows and columns in a
feature matrix (e.g., RMS, MF, HRV)
represent?
2.Why do we scale data before
applying PCA?
3.What is lost when we reconstruct
EMG using only the top 3 principal
components?
4.How does increasing kk affect the
decision boundary?
SVM Overview: Margin & Kernel Trick.
What is an SVM?
Support Vector Machine is a supervised learning model for classification
(and regression).
Finds the optimal hyperplane that best separates classes with maximum
margin.
The margin is the distance between the hyperplane and the closest data
points from each class (support vectors).
SVM maximizes this margin to improve generalization.
Equation of decision boundary:
Support vectors: data points that lie exactly on the margin boundary.
Why Margin Matters?
"Wider margin = better robustness to new, unseen data."
SVM Overview: Margin & Kernel Trick.
Real-world data is often not linearly separable.
The kernel trick allows SVM to operate in a higher-dimensional space
without explicitly computing the transformation.
Common Kernels:
Computes:
→ Without explicitly calculating ϕ!
Kernel Trick (Nonlinear Separation)
Kernel Trick (Nonlinear Separation)
Classifier Comparison: BiasVariance & Cross-Validation
Bias: Error due to overly simple assumptions (e.g., linear model).
Underfitting: model misses important trends.
Variance: Error due to sensitivity to small training data changes.
Overfitting: model fits noise instead of signal.
Model Types & Tradeoff
Cross-Validation (CV).
Why? Prevents evaluation bias from one train-test split.
What? Repeatedly train/test on different data subsets.
k-Fold CV:
Split data into k folds.
Train on k−1k−1 folds, test on the remaining.
Repeat kk times, average accuracy.
Gives reliable estimate of true performance.
k-NN vs SVM on EMG features.
Confusion Matrix (Binary Classification)
Predicted Positive / Predicted Negative vs. Actual Positive / Actual Negative:
True Positive (TP): Correctly predicted a positive case
False Negative (FN): Actual positive predicted as negative
False Positive (FP): Actual negative predicted as positive
True Negative (TN): Correctly predicted a negative case
This matrix helps you see not just how many predictions are right or wrong,
but the type of errors being made.
Accuracy:
Measures the overall correctness of the model.
Precision:
Out of all predicted positives, how many were correct?
Key Evaluation Metrics
Recall (a.k.a. Sensitivity):
Out of all actual positives, how many did the model detect?
F1 Score:
Harmonic mean of precision and recall balances both in one score.
Key Evaluation Metrics
Confusion Matrix (Binary Classification)
The image shows:
A confusion matrix plot with
TP, FP, FN, TN labeled
A precision-recall tradeoff
curve
Questions
1. What does the margin represent in
SVM, and why is it important?
2. How does cross-validation help when
comparing classifiers?
3. Why might SVM perform better than k-
NN on EMG features?
4. When is the F1 score more useful than
accuracy?
Summary
Biosignal features like RMS, MF, HRV, and entropy form an N×d feature matrix.
PCA reduces redundancy, captures key variance, and suppresses noise,
especially in multi-channel signals like EMG.
k-NN relies on distance metrics and local voting; decision boundaries depend
on k.
SVM finds the widest margin for class separation and can handle nonlinear
patterns using kernel tricks.
Summary
Cross-validation ensures fair comparison and guards against overfitting.
Classifier performance varies in biasvariance tradeoff (e.g., k-NN = low bias,
high variance).
Confusion matrix gives a full picture beyond accuracy.
Precision, recall, and F1 score are crucial for imbalanced or high-risk
decisions.