Abstract:
Due to a rapid growth in the field of multimedia content, the user now demands video
summaries, which represent the video content in a precise and compact manner
according to their needs. Conventionally, video summaries have been produced by
using a low-level image, audio and textual features, which are unaware of the
viewer’s requirements and result in a semantic gap. Video content evokes certain
emotions in a viewer, which can be measured and act as a strong source of
information to generate summaries meeting viewer’s expectation. In this research,
personalized video summarization framework is designed that classifies viewer’s
emotion based on his/her facial expressions and electroencephalography (EEG)
signals while watching a video to extract keyframes is presented.
The first contribution of this thesis is to propose a new strategy to recognize facial
expressions. For this purpose, the stationary wavelet transform is used to extract features for facial expression recognition due to its good localization characteristics,
both in spectral and spatial domains. More specifically, a combination of horizontal
and vertical sub-bands of the stationary wavelet transform is used as these sub-bands
contain muscle movement information for the majority of the facial expressions.
Feature dimensionality is reduced by applying discrete cosine transform on these sub
bands. The selected features are then passed into a feed-forward neural network that is
trained through back propagation algorithm to recognize facial expressions.
The second contribution of this thesis is to generate personal video summaries with
proposed facial expression recognition scheme. The video is shown to the viewer and
facial expressions are recorded simultaneously using a Microsoft Kinect device.
Those frames are selected as keyframes from the video, where different facial
expressions of the viewer are recognized. The third and final contribution of this
research is a new personalized video summarization technique based on human
emotion classification using EEG signals. The video is shown to the viewer and
electrical brain activity is recorded simultaneously using EEG electrodes. Features are
extracted in time, frequency and wavelet domain to classify viewer’s emotion into
happy, love, sad, anger, surprise and neutral. Those frames are selected as keyframes
from the video, where the different emotions of the viewer are evoked.
According to the experimental results the proposed facial expression recognition
scheme using stationary wavelet transform gives an accuracy of 98.8%, 96.61% and
94.28% in case of Japanese Female Facial Expressions (JAFFE), Extended Cohn
Kanade Dataset (CK+) and Microsoft- Kinect (MS-Kinect) datasets. Furthermore, it is
evident from the results that the personalized video summarization using proposed
facial expression recognition generates personal video summaries with high precision, recall, F-measure, accuracy rate, and low error rate, hence reducing the semantic gap.
In case of emotion recognition using EEG signals, classification accuracy up to
92.83% is achieved by using support vector machine classifier when time, frequency
and wavelet domain features are used in a hybrid manner. Experimental results also
demonstrate that the proposed EEG based personal video summarization framework
outperforms the state-of-the-art video summarization methods