INTELLIGENT VIEW EXTRAPOLATION FOR DYNAMIC SCENES

NOOR, HUMERA

DSpace Home
→
Natural Sciences
→
Thesis
→
View Item

INTELLIGENT VIEW EXTRAPOLATION FOR DYNAMIC SCENES

NOOR, HUMERA

URI: http://142.54.178.187:9060/xmlui/handle/123456789/2933

Date: 2008

Abstract:

This thesis targets Artificial Intelligence - a fundamental branch of Computer Engineering striving to provide human-like capabilities and intelligence to the computer systems. More specifically, it deals with computer vision, which has gained a lot of attention by researchers due to its wide applicability in day-to-day tasks involving view generation, synthesizing animations and videos from static images, surveillance, medical imaging, tracking, object recognition and classification etc. This thesis investigates the problem areas of image synthesis, object recognition and object categorization. The problem of generating images at novel, arbitrary and unconstrained viewpoints covering interpolation and extrapolation is investigated by operating on a sparse set of basis images of a real scene. This image generation methodology is further incorporated to develop models for object recognition and categorization. First, an image synthesis strategy has been presented that generates virtual views at arbitrary points using interpolation and extrapolation from a sparse set of images. The traditional work on view synthesis using interpolation has been extended and it has been shown that view extrapolation can be done as easily as interpolation. Moreover, certain scenarios have been identified like planar and/or multi-planar scenes and pure rotational camera motion for image capture that allow direct retrieval of the underlying mapping function between the images and hence leading to even more simplified image extrapolation. The major issues and factors affecting the accuracy of generation have been explored and suggestions are presented to improve the virtual view quality. Next, an approach is presented to generate a model for multi-view object recognition. A view- centered model is generated using either a video sequence or a sparse set of images captured around the object following arbitrary and unconstrained camera trajectory. It does not require any prior knowledge of camera parameters and positioning or motion of object and/or camera. The model thus generated is quite dense with a lot of redundant images. Thus the virtual view generation strategy is applied to identify the redundant images and remove them. This results in a model that is computationally economical in terms of space and time. Next, for testing or recognition, the model is used in conjunction with a video sequence which provides information of multiple views of the object and thus increases the confidence measure of results. The model is robust in that it captures the topological structure of the objects from multiple viewpoints allowing the use of a video iiisequence rather than a single test image for object recognition. No constraint has been placed on camera and/or object motion while capturing the video. Next, an approach for video-based multi-view object classification is presented. For each object instance of a particular category, a neighborhood graph-based model is generated using the set of input images which are arranged in a manner that highlights the underlying topological structure. Again, no constraint is placed on the motion and placement of the object and camera during image capture. Moreover no prior knowledge of positioning or parameters of camera is desired. The view synthesis algorithm is used to identify the redundant images in the model and remove them to give a computationally economical model in terms of space and training time. The independent graphs of the different instances of the object category are then merged by automatically identifying the corresponding viewpoints across them. The strength of this approach is that it allows object categorization from multiple viewpoints while eliminating the need of manual alignment of common viewing angles across object instances. Another strength is that the video sequences have been used for object classification, instead of images, which increases precision of results.

Show full item record