Abstract:
Conventional digital arithmetic circuits are designed to operate on a specified range of operand
magnitudes. The architecture of these circuits is typically developed to improve the area time
characteristics and obtain an energy efficient implementation operating at the desired
throughput. Moreover, these circuits are required to provide full precision for the full dynamic
range of operands and operate at the speed of worst case. Albeit the correctness of results is
ensured, the input data statistics are not taken into account. Resultantly, redundant
computations are performed since the information content of input data sequence in certain
applications, such as image and video processing, is known to be much less than the simple
binary descriptions used by the conventional arithmetic circuits. However, run time
identification and exploitation of the latent redundancy in computations is non trivial owing to
the diverse statistical nature of input data. Although efforts have been made in the past to
design low power architectures by exploiting certain patterns in the magnitudes of the
operands, few attempts have been made to improve logic area and processing time efficiency by
harnessing the statistical properties of the inputs. It seems conducive to design application
specific hardware that explicitly incorporates data statistics while computing and consequently
saves precious computation cycles and or logic resources which are otherwise wasted in
redundant computations. This thesis proposes hardware design approaches that utilize the
inherent redundancy in input operands of image and video processing applications in the
implementation of arithmetic circuits to achieve most economical tradeoffs between logic
resources, processing time and results precision. Specifically, modifications and enhancements
in conventional arithmetic hardware design approaches, namely Distributed Arithmetic, Sub-
Expression Sharing, Fast FIR parallel filtering and Approximate Processing have been reported to
further enhance their efficiency. Conventional Distributed Arithmetic approach to trade-off logic
area with processing speed has been modified to include Memoization based Look Up Tables for
storage of partial results from past computations. This modification harnesses bit level
redundancies in input operands and leads to decrease in processing time on average while
requiring a proportionately lesser increase in hardware resource requirements. The proposed
approach has been used to implement Color Space Conversion module for incorporation as
Instruction Set Architecture enhancement in open-source Intellectual Property Core for OR1200
32-bit processor. Similarly, conventional Sub-Expression Sharing and Fast FIR parallel filtering
techniques to implement low complexity hardware structures have been modified to identify
low entropy portions of operands for processing with reduced precision. The ensuing low
complexity hardware structures depict negligible loss in output precision when used to process
low entropy image data while saving precious logic resources in higher proportion. Merits of
incorporating input data statistics in hardware design process have been further illustrated
through low precision implementation of statistics-inspired circuits for computing Sum of
Squared Error and Normalized Cross Correlation as error metric in template matching applications
such as Motion Estimation and Disparity Estimation. The proposed low complexity designs
process low and high entropy portions of the operands with appropriate logic efforts and
perform better than conventional approximate processing circuits which use brute force
quantization of operands as well as output results to reduce hardware complexity. The
statistics-inspired hardware designs proposed in this thesis either achieve lower processing
time using comparable logic resources or lower hardware cost with minimal impact on
precision when compared with the conventional approaches. Efficacy of employing signal
statistics information in hardware design process has been shown through proofs of logic
resource savings in Field Programmable Gate Arrays based implementations and performance
in several real-world applications. Up to 70% hardware savings with minimal impact on output
precision in the design of Approximate Squarer and 50% reduction in computation times at full
precision for Modified Distributed Arithmetic circuits have been achieved through application of
the statistics-inspired hardware design approach.