EEG Validation of SleepStats Sleep/Wake Classification
The Signal Solutions analysis software SleepStats Data ExplorerTM uses statistical methods to classify the data collected from the piezo sensor into 2 classes or events, i.e. ‘Sleep’ and ‘Wake’. The classification is done using a set of features extracted from the data to separate it into ‘Sleep’ or ‘Wake’. Data is classified into ‘Sleep’/’Wake’ in two second intervals .
Accuracy & Scoring
The accuracy of our classification was tested by performing simultaneous electroencephalography (EEG) and piezo recordings in mice. EEG and EMG electrodes were surgically implanted into mice, and a piezo sensor was placed on the floor of each cage. Twenty four hours of EEG, electromyogram (EMG), and piezo data were simultaneously collected from twenty mice.
Two trained human sleep scorers used the EEG and EMG data to label the 24 hours of data into ‘Sleep’ or ‘Wake’. Human scoring of EEG is still considered the gold standard measurement of sleep in rodents. The table below shows the predictive performance of our classifier with respect to the human scored sleep.
The table below displays a confusion matrix and its performance measures. In the table, the sum of the first two values in the first row (341,503 + 32,139) equals the instances which were labeled as ‘Sleep’ by the human scorer (EEG). Of the total EEG-scored sleep instances, 341,503 instances were correctly predicted as sleep by the classifier (true positive, TP) and 32,139 instances were incorrectly predicted as wake (false negative, FN).
Similarly, the sum of the first two values in the second row (32,279 + 389,900) equals the instances which were labeled as ‘Wake’ by the human scorer (EEG). Of the total EEG-scored wake instances, 389,900 (true positive) instances were correctly predicted as wake by the classifier and 32,279 instances were incorrectly predicted as sleep (false negative for wake).
Precision and Recall
The predictive measure ‘Precision’ represents the positive predictive value , that is, the fraction of the predicted (or retrieved) instances that are relevant (or true). ‘Recall’ represents the sensitivity , that is, the fraction of the relevant (or true) instances that get predicted. Table 1 shows that 91.4% of sleep and 92.3% of wake were correctly predicted (or retrieved) by our classifier. The two human scorers, on an average, show about 95% agreement in the labeling of the EEG data.
Table 1. SleepStats vs. Human labeling (TP: true positive, FP: false positive, FN: false negative). Rows represent instances called by EEG. Columns represent calls by the classifier.