Abstract:
Energy-Efficient Task and Data Scheduling for Large-Scale Computing Systems
With the recent advancements in integration technologies and aggressive transistor scaling, dozens or even hundreds of processing cores can be integrated onto a single chip changing the landscape for large-scale high performance computing systems. The current trend clearly indicates the dominance of multicore processors in the forthcoming personal and household devices, such as mobile phones, tablets, and laptops. Similarly, it is predicted that majority of the future high performance computing devices and embedded systems will be based on multicore architecture. In the meantime, Network-on-Chip (NoC) has emerged as a scalable and modular packet-switched interconnect for multicore architectures. NoC based multicore architectures are considered suitable for both real-time embedded systems and large data centers that support cloud computing services. Traditionally, performance enhancement by reducing workflow execution time (makespan) has been the primary optimization goal for large-scale computing systems. Nevertheless, in the past decade, energy consumption has also joined the set of desired optimization metric as a priority design constraint. Energy management is even more critical for multicore systems because it not only affects the operating cost but also influence the life-time and reliability of the system. However, it is not trivial to achieve both energy efficiency and makespan minimization simultaneously. Similarly, communication overhead, incurred due to data movement among the interdependent tasks, has profound effect on application throughput and energy consumption.
In this work, the problem of developing efficient task scheduling and data allocation techniques is studied for NoC based multicore systems considering multiple optimization constraints, particularly, energy consumption, network overhead, and makespan. To this end, we first investigated the problem of jointly optimizing energy consumption and communication overhead through efficient task and core mapping in NoC based multicores for non-DAG (Directed Acyclic Graph) workflows. We approached the optimization problem of energy and communication-aware mapping by introducing a new class of bin packing problem, the Variable bEnefit Bin pAcking
xi
Problem (VEBAP) and transformed VEBAP into a variation of knapsack optimization problem termed as Variable Benefit Knapsack Problem (VBKP). To solve VBKP, we developed a dynamic programming based bin packing algorithm. Knapsack based bin packing algorithm for workload consolidation places tasks in such a manner that utilization of processing cores is maximized while network overhead is minimized. We also proposed a task swapping algorithm that attempts to further optimize the task placement to reduce inter-core communication. Moreover, several core mapping heuristics are proposed for bin to core mapping. In addition, we also applied a Pareto-efficient algorithm, on top of the bin packing algorithms, attempting to explore the solution space simultaneously in two dimensions, i.e., energy consumption and communication overhead. Next, we shifted our focus to unified task scheduling and data allocation for DAG based workflow applications. In this particular work, we adopted a NoC based multicore architecture with multi-level cache hierarchy that are the basis for Single-Chip Cloud Computer (SCC). Developing scheduling algorithms to improve energy efficiency as well as reduce data access latency is a major challenge for SCCs with deep memory hierarchies. We proposed three scheduling algorithms that make judicial tradeoffs between energy consumption and makespan. Lastly, we explored the dynamic task mapping for NoC based multicores and carried out extensive evaluation of selected algorithms. Moreover, we proposed an extension to the communication-aware packing based nearest neighbor (CPNN) algorithm to reduce communication overhead among the interdependent tasks. We also conducted formal verification and modeling of proposed technique using high level Petri nets. Extensive simulations have been conducted to evaluate the performance of proposed techniques. According to the experimental results, our proposed strategies have shown to be promising for both DAG and non-DAG based application workflows compared with state-of-the-art techniques. Algorithms presented in this dissertation can serve as initial stepping stone towards analyzing the impact of task and data scheduling over the performance requirements, such as energy efficiency and application throughput of large-scale computing systems based on single-chip cloud computer architecture.