ML for Biosignals: Features, PCA & Classifiers

Prince E. Adjei

Kwame Nkrumah University of Science and Technology

Topic: ML for Biosignals Module 4: Advanced Topics

Biosignal Processes And Analysis (BME 366)

Topics

(1). Feature Extraction from Biosignals

(2). Principal Component Analysis (PCA)

(3). k-Nearest Neighbors (k-NN)

(4). Support Vector Machines (SVM)

(5). Classifier Evaluation

ML for Biosignals

Learning Objectives

•Explain the steps in a typical machine learning (ML) pipeline for

biosignal processing.

•Extract relevant features (e.g., RMS, MF, entropy) from biosignal data

and structure them into feature vectors.

•Apply Principal Component Analysis (PCA) for dimensionality

reduction and noise suppression in multi-channel signals.

•Compare and evaluate classifiers such as k-NN and SVM in terms of

decision boundaries, accuracy, and generalization.

•Interpret model evaluation metrics, including the confusion matrix,

precision, recall, and F1 score.

What is a Feature Vector?

•A feature vector is a list of numerical features (descriptors) extracted

from a signal segment that summarize its properties in a compact

form.

Common Features in Biosignal Processing:

Feature Vector

Why Use a Feature Vector?

•Reduces large signals to compact, interpretable numbers

•Enables machine learning, classification, or pattern recognition

•Makes analysis scalable across time and subjects

N ×d Table Explained:

•N = Number of signal segments (e.g., per 5s window)

•d = Number of features per segment (e.g., 4 features → RMS, MF, HRV,

Entropy)

Feature Vector

N×d Table Explained:

•N = Number of signal segments (e.g., per 5s window)

•d = Number of features per segment (e.g., 4 features →RMS, MF, HRV,

Entropy)

Feature Vector

What is Scaling?

•Why? PCA is sensitive to scale , features with large ranges dominate

variance.

Common Methods:

•Standardization:

•Min-Max Scaling: Rescales to [0,1]

Scaling & PCA

PCA is a technique that:

•Reduces the number of features in your data

•Keeps the features that carry the most important information

•Helps you remove noise and visualize data in 2D or 3D

How it works:

•Compute the covariance matrix of scaled data.

•Find eigenvectors (directions) and eigenvalues (variance captured).

•Project data onto the top k components.

What is PCA (Principal Component Analysis)?

Variance capture tells you how much of the original data's information is

kept when you reduce dimensions using PCA.

•The first principal component captures the most variance (i.e., the most

important pattern in your data).

•The second component captures the next most, and so on.

Total Variance Explained = sum of eigenvalues.

•Screen Plot shows how many components capture how much variance.

•Retain enough PCs to capture ~90–95% variance.

“High variance →signal. Low variance →likely noise.”

Variance Capture

•Noise = small variance, scattered directions.

PCA filters noise by:

•Discarding components with low eigenvalues.

•Keeping only directions with structured variation.

Analogy:

PCA is like tuning a radio —remove static (noise) and amplify the music

(signal).

Noise Suppression

•Multi-channel EMG signals have correlated and redundant channels,

so we use PCA to reduce dimensionality and suppress noise while

keeping core signal features.

PCA : Multi-Channel EMG → PCA → Reconstruction

Multi-Channel EMG Data:

•Example: 8 channels from a forearm muscle array.

•Raw EMG contains redundancy, artifacts, and noise.

Apply PCA:

•Covariance analysis: Finds principal components (PCs) across channels.

•Projection: EMG signals transformed to PC space.

•Variance ranked: The First few PCs capture most muscular activity.

Pipeline Overview

Reconstruction:

•Reconstruct EMG using only top kk PCs (e.g., top 3).

•Inverse transform to channel space:

Pipeline Overview

•Reduces noise, keeps essential signal features.

Key Benefits:

Pipeline Overview

PCA on Simulated Multi-channel EMG

k-Nearest Neighbors (k-NN)

What is k-NN?

•A non-parametric, instance-based learning algorithm.

•Classifies a new point based on the majority class among its k

nearest neighbors in the training set.

How it Works:

•Choose k (number of neighbors).

•Compute distances from the test point to all training points.

•Select the k nearest points.

•Assign the majority label among them.

Decision Boundaries

•k-NN creates non-linear, flexible decision boundaries.

•Small k: captures fine details but may overfit (noisy boundary).

•Large k: smooths boundary, better generalization, but may underfit.

k-Nearest Neighbors (k-NN)

Most common: Euclidean Distance

Others:

•Manhattan Distance:

•Cosine Distance, Minkowski, etc.

•Choice of distance affects results —important in high-dimensional or non-

numeric data.

Distance Metric (How 'near' is defined)

k-Nearest Neighbors (k-NN)

Questions

1.What do the rows and columns in a

feature matrix (e.g., RMS, MF, HRV)

represent?

2.Why do we scale data before

applying PCA?

3.What is lost when we reconstruct

EMG using only the top 3 principal

components?

4.How does increasing kk affect the

decision boundary?

SVM Overview: Margin & Kernel Trick.

What is an SVM?

•Support Vector Machine is a supervised learning model for classification

(and regression).

•Finds the optimal hyperplane that best separates classes with maximum

margin.

•The margin is the distance between the hyperplane and the closest data

points from each class (support vectors).

•SVM maximizes this margin to improve generalization.

Equation of decision boundary:

•Support vectors: data points that lie exactly on the margin boundary.

Why Margin Matters?

"Wider margin = better robustness to new, unseen data."

SVM Overview: Margin & Kernel Trick.

•Real-world data is often not linearly separable.

•The kernel trick allows SVM to operate in a higher-dimensional space

without explicitly computing the transformation.

Common Kernels:

Computes:

→ Without explicitly calculating ϕ!

Kernel Trick (Nonlinear Separation)

Classifier Comparison: Bias–Variance & Cross-Validation

•Bias: Error due to overly simple assumptions (e.g., linear model).

•Underfitting: model misses important trends.

•Variance: Error due to sensitivity to small training data changes.

•Overfitting: model fits noise instead of signal.

Model Types & Tradeoff

Cross-Validation (CV).

•Why? Prevents evaluation bias from one train-test split.

•What? Repeatedly train/test on different data subsets.

k-Fold CV:

•Split data into k folds.

•Train on k−1k−1 folds, test on the remaining.

•Repeat kk times, average accuracy.

•Gives reliable estimate of true performance.

k-NN vs SVM on EMG features.

Confusion Matrix (Binary Classification)

Predicted Positive / Predicted Negative vs. Actual Positive / Actual Negative:

•True Positive (TP): Correctly predicted a positive case

•False Negative (FN): Actual positive predicted as negative

•False Positive (FP): Actual negative predicted as positive

•True Negative (TN): Correctly predicted a negative case

This matrix helps you see not just how many predictions are right or wrong,

but the type of errors being made.

Accuracy:

Measures the overall correctness of the model.

Precision:

Out of all predicted positives, how many were correct?

Key Evaluation Metrics

Recall (a.k.a. Sensitivity):

Out of all actual positives, how many did the model detect?

F1 Score:

Harmonic mean of precision and recall —balances both in one score.

Key Evaluation Metrics

Confusion Matrix (Binary Classification)

The image shows:

•A confusion matrix plot with

TP, FP, FN, TN labeled

•A precision-recall tradeoff

curve

Questions

1. What does the margin represent in

SVM, and why is it important?

2. How does cross-validation help when

comparing classifiers?

3. Why might SVM perform better than k-

NN on EMG features?

4. When is the F1 score more useful than

accuracy?

Summary

•Biosignal features like RMS, MF, HRV, and entropy form an N×d feature matrix.

•PCA reduces redundancy, captures key variance, and suppresses noise,

especially in multi-channel signals like EMG.

•k-NN relies on distance metrics and local voting; decision boundaries depend

on k.

•SVM finds the widest margin for class separation and can handle nonlinear

patterns using kernel tricks.

Summary

•Cross-validation ensures fair comparison and guards against overfitting.

•Classifier performance varies in bias–variance tradeoff (e.g., k-NN = low bias,

high variance).

•Confusion matrix gives a full picture beyond accuracy.

•Precision, recall, and F1 score are crucial for imbalanced or high-risk

decisions.