Posts
Oct, 1
Interactive Soft Tissue for Surgical Simulation
Medical simulation has the potential to revolutionise the training of medical practitioners. Advantages include reduced risk to patients, increased access to rare scenarios and virtually unlimited repeatability. However, in order to fulfil its potential, medical simulators require techniques to provide realistic user interaction with the simulated patient. Specifically, compelling real-time simulations that allow the trainee […]
Oct, 1
Image registration on GPU
Image registration is a fundamental step in many applications involving image analysis. It consists of optimizing a similarity metric to find a spatial transformation to match two images (in 3D). It has application in medical images to build atlases (registering a population), or to align a patient to a template to detect pathologies. The main […]
Sep, 30
Exploring The Latency and Bandwidth Tolerance of CUDA Applications
CUDA applications represent a new body of parallel programs. Although several paradigms exist for programming distributed systems and many-core processors, many users struggle to achieve a program that is scalable across systems with different hardware characteristics. This paper explores the scalability of CUDA applications on systems with varying interconnect latencies, hiding a hardware detail from […]
Sep, 30
Architecture-Aware Mapping and Optimization on Heterogeneous Computing Systems
The emergence of scientific applications embedded with multiple modes of parallelism has made heterogeneous computing systems indispensable in high performance computing. The popularity of such systems is evident from the fact that three out of the top five fastest supercomputers in the world employ heterogeneous computing, i.e., they use dissimilar computational units. A closer look […]
Sep, 30
Real-Time Handling of GPU Interrupts in LITMUSRT
Graphics processing units (GPUs) are becoming increasingly important in today’s platforms as their increased generality allows for them to be used as powerful co-processors. However, unlike standard CPUs, GPUs are treated as I/O devices and require the use of interrupts to facilitate communication with the CPU. Interrupts cause delays in the execution of real-time tasks, […]
Sep, 30
Enhancing Data Locality for Dynamic Simulations through Asynchronous Data Transformations and Adaptive Control
Many dynamic simulation programs contain complex, irregular memory reference patterns, and require runtime optimizations to enhance data locality. Current approaches periodically stop the execution of an application to reorder the computation or data based on the current program state to improve the data locality for the next period of execution. In this work, we examine […]
Sep, 30
Stack-less SIMT reconvergence at low cost
Parallel architectures following the SIMT model such as GPUs benefit from application regularity by issuing concurrent threads running in lockstep on SIMD units. As threads take different paths across the control-flow graph, lockstep execution is partially lost, and must be regained whenever possible in order to maximize the occupancy of SIMD units. In this paper, […]
Sep, 30
A PTX Code Generator for LLVM
Today’s GPGPU architectures and corresponding high level programming languages like CUDA replace the traditionally restricted GPU pipelines. Proprietary compilers allow to translate these languages into native GPU assembly. Unfortunately, these compilers are non-customizable and restricted to static compilation. High performant application currently require particular manual optimizations. To overcome these cumbersome manual optimizations, this thesis develops […]
Sep, 30
Adaptable Two-Dimension Sliding Windows on NVIDIA GPUs with Runtime Compilation
For some classes of problems, NVIDIA CUDA abstraction and hardware properties combine with problem characteristics to limit the specific problem instances that can be effectively accelerated. As a real-world example, a twodimensional correlation-based template-matching MATLAB application is considered. While this problem has a well known solution for the common case of linear image filtering-small fixed […]
Sep, 30
CBench: Analyzing Compute Performance for Modern NVIDIA and AMD GPUs
General purpose GPU computation is a fast growing ?eld with a variety of applications. For maximum performance, though, mapping high-level parallel algorithms to vendor hardware requires a solid grasp of both the algorithm’s computational requirements and the microarchitectural limitations of the GPU. This work aims to explore the performance of high and low arithmetic intensity […]
Sep, 30
FATSEA-An Architectural Simulator for General Purpose Computing on GPUs
We present FATSEA, a functional and performance evaluation simulator written in C++ to handle kernels written in the CUDA programming language aimed for GPGPU computing. FATSEA takes a Parallel Thread eXecution (PTX ) code as input, which is a device independent code format generated by the Nvidia CUDA compiler, to validate results and estimate performance […]
Sep, 30
Translating GPU binaries to tiered SIMD architectures with Ocelot
Parallel Thread Execution ISA (PTX) is a virtual instruction set used by NVIDIA GPUs that explicitly expresses hierarchical MIMD and SIMD style parallelism in an application. In such a programming model, the programmer and compiler are left with the not trivial, but not impossible, task of composing applications from parallel algorithms and data structures. Once […]