Real-Time Drowsiness Detection Using Edge Device

Mrinank Purwar and Siddhi Vinayak
Aug 16, 2021
6 min read

Introduction

If you have driven for long before, you’ve been drowsy at the wheel at some point. It’s not something we like to admit but it’s an important problem with serious consequences that need to be addressed. 1 in 4 vehicle accidents is caused by drowsy driving and 1 in 25 adult drivers who usually have long drives at night in the world report that they have fallen asleep at the wheel in the past 30 days. The scariest part is that drowsy driving isn’t just falling asleep while driving. Drowsy driving can be as small as a brief state of unconsciousness when the driver is not paying full attention to the road.

Additionally, we believe that drowsiness can negatively impact people in working with heavy machinery and classroom environments as well.

Due to the relevance of this problem, we believe it is important to develop a solution for drowsiness detection, especially in the early stages to prevent accidents. Our solution to this problem is to build a detection system that identifies key attributes of drowsiness and triggers an alert when someone is drowsy before it is too late.

Data Source and Preprocessing

For our training and test data, we used the Real-Life Drowsiness Dataset created by a research team from the University of Texas at Arlington specifically for detecting multi-stage drowsiness. The end goal is to detect not only extreme and visible cases of drowsiness but allow our system to detect softer signals of drowsiness as well.

The dataset consists of around 30 hours of videos of 60 unique participants each video was about 10 mins. From the dataset, we were able to extract facial landmarks from 162 videos of 51 participants. This allowed us to obtain a sufficient amount of data for both the alert and drowsy state. For each video, we used OpenCV to extract 1 frame per second from starting of the video to 2 minutes. So we were able to collect a total of 20000 frames from the dataset.

We used extract_face_landmarks of mlxtend which has dlib model for face recognition. This first detects the face and marks 68 facial points on our face and returns its respective coordinates. There were 68 total landmarks per frame but we decided to keep the landmarks for the eyes and mouth only (Points 37–68). These were the important data points we used to extract the features for our model. There were no missing values, as whenever the face was detected, 68 facial coordinates were returned and if coordinates could be mapped on the face, it returns a negative value.

68 Facial Points

The dataset Initially contained only landmarks (as datatype String) column and the corresponding label of the frame extracted from the video. Later we did data preprocessing and converted landmarks into landmarks1 (as datatype List).

From the landmarks1 list, we extracted Coordinates for the left eye (36, 42), right eye (42, 48), and mouth (48,68) and made their corresponding columns in the data_frame as shown in the figure below:

Data Cleaning

Image Rotation:

While we were extracting frames from the videos of different extensions recorded from different devices, we faced an issue with the image rotation code. As some frames were initially rotated by 90, 180, or 270 degrees because of this mlxtend‘s extract_face_landmarks was unable to detect face and map the points on the face. Hence we used these functions provided below:

Feature Extraction

As briefly alluded to earlier, based on the facial landmarks that we extracted from the frames of the videos, we ventured into developing suitable features for our classification model. While we hypothesized and tested several features, the four core features that we concluded on for our final models were eye aspect ratio, mouth aspect ratio, pupil circularity, and finally, mouth aspect ratio over eye aspect ratio.

Eye Aspect Ratio (EAR)

Eye Aspect Ratio is the ratio of the length of the eyes to the width of the eyes. The length of the eyes is calculated by taking average of two distinct vertical lines between p2, p6 and p3, p5 and distance between points across the eyes i.e. p1, p4 as illustrated in the figure below.

Eye Aspect Ratio (EAR)

Whenever an individual was drowsy, their eyes are likely to get smaller and the distance between the points will tend towards zero. Based on this, we can use our model to predict the individual as drowsy. If the eye aspect ratio for an individual over successive frames started to decline i.e. their eyes started to be more closed or they were blinking faster.

EAR = mean (dist (p2, p6), dist (p3, p5)) / dist (p1, p4)

We calculated EAR for both the eyes and later took mean between EAR’s of the left and right eye.

Mouth Aspect Ratio (MAR)

This is a similar mathematical approach as that of EAR, as you would expect, measures the ratio of the length of the mouth to the width of the mouth. We can calculate the length of the mouth by considering average distance between points p2, p8; p3, p7; p4, p6 and width by considering distance between points p1, p5. Our hypothesis was that as an individual becomes drowsy, they are likely to yawn and lose control over their mouth, making their MAR to be higher than usual in this state.

Mouth Aspect Ratio (MAR)

Pupil Circularity (PUC)

PUC is a measure complementary to EAR, but it places a greater emphasis on the pupil instead of the entire eye.

Pupil Circularity (PUC)

For example, someone who has their eyes half-open or almost closed will have a much lower pupil circularity value versus someone who has their eyes fully open due to the squared term in the denominator. Similar to the EAR, the expectation was that when an individual is drowsy, their pupil circularity is likely to decline.

Mouth Aspect Ratio over Eye aspect Ratio (MOE)

MOE is simply the ratio of the MAR to the EAR.

Mouth Aspect Ratio over Eye aspect Ratio (MOE)

The advantages of using this feature were that EAR and MAR are expected to move in opposite directions if the state of the individual changes. Because the MOE takes MAR as the numerator and EAR as the denominator, our theory was that as the individual gets drowsy, the MOE will increase.

Model and Training:

EAR_threshold: If the eye aspect ratio falls below this threshold, we can say that it may be the case of drowsiness.

MAR_threshold: If the mouth aspect ratio falls above this threshold, we can say that it may be the case of drowsiness. (Only when the person is Yawning)

So, If EAR < EAR_threshold, we labeled it as 1 for drowsiness, MAR >> MAR_threshold (incase of yawning) we again labeled it as drowsiness, and if both the condition satisfies simultaneously, we again labeled it as drowsiness. Else, one is in Alert State.

MODEL

We used this data to develop and validate the SVM model for two classification schemes: Classification Scheme I (The person is in Alert State) and Classification Scheme II (The person is feeling Drowsiness). The SVM models were used to select sets of features that would yield the best classification of individuals into these categories.

Hyperparameters

kernel - rbf

degree - 3

gamma - scale

C (regularization parameter) - 1

shrinking - true

Class weight - balanced

Real-Time Drowsiness Detection

Error Metrics

We had a binary classification problem as we were having samples belonging to two classes: Alert State or Drowsiness. Confusion Matrix as the name suggests gives us a matrix as output and describes the complete performance of the model.

Confusion Matrix

On testing our model on 6655 samples, we get the following result:

True Positives (TP): The cases in which we predicted Drowsiness and the actual output was also Drowsiness. Count - 2241

True Negatives (TN): The cases in which we predicted Alert state and the actual output was Alert State. Count - 3657

False Positives (FP): The cases in which we predicted Drowsiness and the actual output was Alert State. Count- 380

False Negatives (FN): The cases in which we predicted Alert State and the actual output was Drowsiness. Count- 377

Precision -> TP / (TP + FP)

Recall -> TP / (TP + FN)

F1 Score -> 85 %

Future Scope and Improvements:

We can reduce the number of False Negatives for our model by checking class imbalance and eliminating it from our dataset. We can add more frames to our dataset which could result in better prediction of our classification.
Optimize our model to increase its speed so that it can match frame rates with real-time videos.
A hardcoded threshold doesn’t fit in the general case. Someone with a low eye aspect ratio would be classified as Drowsy, even though they might not be. This issue can be resolved by taking initial frames as reference for marking EAR_threshold and MAR_threshold.

This model can be deployed in cutting-edge devices like mobile phones, raspberry pi with a camera module such that it can be mounted in front of the driver’s seat, or can be implied in any organisation to check if workers are attentive while doing their jobs.

Thank you for reading, we hope you find our article interesting!

To integrate Real-Time Drowsiness Detection in your application then contact us here - contact@godatainsights.com

Author-

https://www.linkedin.com/in/mrinank-purwar-711847184

https://www.linkedin.com/in/siddhi-vinayak-tripathi-687853168