Posts
Jan, 29
Consolidating Applications for Energy Efficiency in Heterogeneous Computing Systems
By scheduling multiple applications with complementary resource requirements on a smaller number of compute nodes, we aim to improve performance, resource utilization, energy consumption, and energy efficiency simultaneously. In addition to our naive consolidation approach, which already achieves the aforementioned goals, we propose a new energy efficiency-aware (EEA) scheduling policy and compare its performance with […]
Jan, 29
Wideband Channelization for Software-Defined Radio via Mobile Graphics Processors
Wideband channelization is a computationally intensive task within software-defined radio (SDR). To support this task, the underlying hardware should provide high performance and allow flexible implementations. Traditional solutions use field-programmable gate arrays (FPGAs) to satisfy these requirements. While FPGAs allow for flexible implementations, realizing a FPGA implementation is a difficult and time-consuming process. On the […]
Jan, 29
On the Programmability and Performance of Heterogeneous Platforms
General-purpose computing on an ever-broadening array of parallel devices has led to an increasingly complex and multi-dimensional landscape with respect to programmability and performance optimization. The growing diversity of parallel architectures presents many challenges to the domain scientist, including device selection, programming model, and level of investment in optimization. All of these choices influence the […]
Jan, 29
A Performance Criteria for parallel Computation on basis of block size using CUDA Architecture
GPU based on CUDA Architecture developed by NVIDIA is a high performance computing device. Multiplication of matrices of large order can be computed in few seconds using GPU based on CUDA Architecture. A modern GPU consists of 16 highly threaded streaming multiprocessors (SMs). GPU named Fermi consists of 32 SMs. These are computing intensive devices. […]
Jan, 29
Impact of communication times on mixed CPU/GPU applications scheduling using KAAPI
High Performance Computing machines use more and more Graphical Processing Units as they are very efficient for homogeneous computation such as matrix operations. However before using these accelerators, one has to transfer data from the processor to them. Such a transfer can be slow. In this report, our aim is to study the impact of […]
Jan, 28
Scheduling on Manycore and Heterogeneous Graphics Processors
Through custom software schedulers that distribute work differently than built-in hardware schedulers, data-parallel and heterogenous architectures can be retargeted towards irregular task-parallel graphics workloads. This dissertation examines the role of a GPU scheduler and how it may schedule complicated workloads onto the GPU for efficient parallel processing. This dissertation examines the scheduler through three different […]
Jan, 28
Automatic Resource-Constrained Static Task Parallelization
This thesis intends to show how to efficiently exploit the parallelism present in applications in order to enjoy the performance benefits that multiprocessors can provide, using a new automatic task parallelization methodology for compilers. The key characteristics we focus on are resource constraints and static scheduling. This methodology includes the techniques required to decompose applications […]
Jan, 28
GPU-Qin: A Methodology for Evaluating the Error Resilience of GPGPU Applications
While graphics processing units (GPUs) have gained wide adoption as accelerators for general-purpose applications (GPGPU), the end-to-end reliability implications of their use have not been quantified. Fault injection is a widely used method for evaluating the reliability of applications. However, building a fault injector for GPGPU applications is challenging due to their massive parallelism, which […]
Jan, 28
Performance-Correctness Challenges in Emerging Heterogeneous Multicore Processors
We are witnessing a tremendous amount of change in the design of the modern microprocessor. With dozens of CPU cores on-chip recent multicore processors, the search for thread-level parallelism (TLP) is more significant than ever. In parallel, a very different processor architecture has emerged that aims to extract parallelism at an entirely different scale. Originally […]
Jan, 28
Autotuning Programs with Algorithmic Choice
The process of optimizing programs and libraries, both for performance and quality of service, can be viewed as a search problem over the space of implementation choices. This search is traditionally manually conducted by the programmer and often must be repeated when systems, tools, or requirements change. The overriding goal of this work is to […]
Jan, 26
gem5-gpu: A Heterogeneous CPU-GPU Simulator
gem5-gpu is a new simulator that models tightly integrated CPU-GPU systems. It builds on gem5, a modular fullsystem CPU simulator, and GPGPU-Sim, a detailed GPGPU simulator. gem5-gpu routes most memory accesses through Ruby, which is a highly configurable memory system in gem5. By doing this, it is able to simulate many system configurations, ranging from […]
Jan, 26
A Dynamic Offload Scheduler for spatial multitasking on Intel Xeon Phi Coprocessor
Intel Xeon Phi Coprocessor appears and it fully supports multitasking, but it does not automatically ensure high performance in this case. A conventional task level resource allocation scheduler could be used, but a processor utilization of the Xeon Phi is low because of idle time on the Xeon Phi. In this paper, we propose a […]