dc.description.abstract |
Image clustering deals with the optimal partitioning of images into different groups. Using
linear discriminant analysis (LDA) criterion, optimal partitioning of images is obtained by
maximizing the ratio of between-class scatter matrix (Sb) to within-class scatter matrix
(Sw). In global learning based clustering models, scatter matrices (Sb and Sw) were evaluated
on whole image datasets. Owing to which, nonlinear manifold in image datasets may
not be effectively handled. For manifold learning, local neighborhood information in data
objects were utilized in local learning based clustering models. Further, for high-dimensional
data, Sw is singular which corresponds to under-sampled or small-sample-size (SSS) problem
of LDA. Owing to which, almost all global learning and local learning based clustering
models are based on regularized discriminant analysis (RDA), a variant of LDA. In RDA,
the singularity problem of Sw was solved by perturbing it with regularization parameter
λ > 0. However, tuning for optimal value of parameter λ is required. Further, for optimal
clustering performance in existing state-of-the-art local learning based clustering models,
one has to tune a number of clustering parameters from a large candidate set.
In this thesis, we propose a novel local learning based image clustering model. Our
proposed clustering model is inspired from exponential discriminant analysis (EDA). EDA is
another variant of LDA in which SSS problem of LDA was handled using matrix exponential
properties. Owing to which, EDA is less parameterized as compared with RDA. Number
of nearest neighbor images k is the only clustering parameter in our proposed clustering
model as compared with existing state-of-the-art local learning based clustering models.
Image clustering performances on 12 benchmark image datasets are comparable over near
competitor RDA based image clustering model. Performances are comparable because no
discriminant information of LDA is lost in EDA. However, well separated images may not
be achieved at local level for image datasets that contain images with pose, illumination, or
xiv
occlusion variations. Owing to which, local learning based image clustering models may face
limitations in such variations. For this problem, various clustering models were proposed
in which both global learning and local learning approaches were utilized. However, almost
all existing local and global learning based clustering models are based on RDA. Owing to
which, tuning of clustering parameters is extensive in almost all state-of-the-art local and
global learning based image clustering models.
We propose novel local and global learning based image clustering models that are inspired
from EDA. Our proposed image clustering models are less-parameterized and computationally
efficient where image clustering performances are comparable with existing local
and global learning based clustering models. However, performances of all state-of-the-art
clustering models are not optimal for challenging image datasets that contain images with
illumination and occlusion changes. We explore the challenges in image clustering problem.
We show that variation from one image to another image in a class (within-class variation)
of an image dataset may vary from nominal to significant due to images with different facial
expressions, pose, illumination, or occlusion changes. Using pixel intensity values as
image features, we obtain histogram of within-class variation for each image datasets. On
the basis of histogram, we categorize image datasets as Gaussian-like or multimodal. We
show that image clustering performances of state-of-the-art clustering models are optimal
for Gaussian-like image datasets and it degrade significantly for multimodal image datasets.
We achieved significant overall performance improvement on 13 benchmark image datasets
by employing optimal image descriptors with our proposed clustering model. Our study
shows that there is no direct correlation between image clustering performance and local
neighborhood structure. However, image clustering performance has correlation with the
distribution of within-class variation in image datasets. |
en_US |