Abstract:
This thesis targets Artificial Intelligence - a fundamental branch of Computer Engineering striving to
provide human-like capabilities and intelligence to the computer systems. More specifically, it deals
with computer vision, which has gained a lot of attention by researchers due to its wide applicability
in day-to-day tasks involving view generation, synthesizing animations and videos from static
images, surveillance, medical imaging, tracking, object recognition and classification etc. This thesis
investigates the problem areas of image synthesis, object recognition and object categorization. The
problem of generating images at novel, arbitrary and unconstrained viewpoints covering
interpolation and extrapolation is investigated by operating on a sparse set of basis images of a real
scene. This image generation methodology is
further incorporated to develop models for object recognition and categorization.
First, an image synthesis strategy has been presented that generates virtual views at arbitrary points
using interpolation and extrapolation from a sparse set of images. The traditional work on view
synthesis using interpolation has been extended and it has been shown that view extrapolation can be
done as easily as interpolation. Moreover, certain scenarios have been identified like planar and/or
multi-planar scenes and pure rotational camera motion for image capture that allow direct retrieval
of the underlying mapping function between the images and hence leading to even more simplified
image extrapolation. The major issues and factors affecting the accuracy of generation have been
explored and suggestions are presented to improve the virtual view quality.
Next, an approach is presented to generate a model for multi-view object recognition. A view-
centered model is generated using either a video sequence or a sparse set of images captured around
the object following arbitrary and unconstrained camera trajectory. It does not require any prior
knowledge of camera parameters and positioning or motion of object and/or camera. The model thus
generated is quite dense with a lot of redundant images. Thus the virtual view generation strategy is
applied to identify the redundant images and remove them. This results in a model that is
computationally economical in terms of space and time. Next, for testing or recognition, the model is
used in conjunction with a video sequence which provides information of multiple views of the
object and thus increases the confidence measure of results. The model is robust in that it captures
the topological structure of the objects from multiple viewpoints allowing the use of a video
iiisequence rather than a single test image for object recognition. No constraint has been placed on
camera and/or object motion while capturing the video.
Next, an approach for video-based multi-view object classification is presented. For each object
instance of a particular category, a neighborhood graph-based model is generated using the set of
input images which are arranged in a manner that highlights the underlying topological structure.
Again, no constraint is placed on the motion and placement of the object and camera during image
capture. Moreover no prior knowledge of positioning or parameters of camera is desired. The view
synthesis algorithm is used to identify the redundant images in the model and remove them to give a
computationally economical model in terms of space and training time. The independent graphs of
the different instances of the object category are then merged by automatically identifying the
corresponding viewpoints across them. The strength of this approach is that it allows object
categorization from multiple viewpoints while eliminating the need of manual alignment of common
viewing angles across object instances. Another strength is that the video sequences have been used
for object classification, instead of images, which increases precision of results.