Posts
Jun, 24
GPU Implementation of the Particle Filter
This thesis work analyses the obstacles faced when adapting the particle filtering algorithm to run on massively parallel compute architectures. Graphics processing units are one example of massively parallel compute architectures which allow for the developer to distribute computational load over hundreds or thousands of processor cores. This thesis studies an implementation written for NVIDIA […]
Jun, 24
Integrating Two-Way Interaction Between Fluids and Rigid Bodies in the Real-Time Particle Systems Library
In the last 15 years, Video games have become a dominate form of entertainment. The popularity of video games means children are spending more of their free time play video games. Usually, the time spent on homework or studying is decreased to allow for the extended time spent on video games. In an effort to […]
Jun, 24
A Visual Approach to Investigating Shared and Global Memory Behavior of CUDA Kernels
We present an approach to investigate the memory behavior of a parallel kernel executing on thousands of threads simultaneously within the CUDA architecture. Our top-down approach allows for quickly identifying any significant differences between the execution of the many blocks and warps. As interesting warps are identified, we allow further investigation of memory behavior by […]
Jun, 24
An Energy Efficient GPGPU Memory Hierarchy with Tiny Incoherent Caches
With progressive generations and the ever-increasing promise of computing power, GPGPUs have been quickly growing in size, and at the same time, energy consumption has become a major bottleneck for them. The first level data cache and the scratchpad memory are critical to the performance of a GPGPU, but they are extremely energy inefficient due […]
Jun, 24
Provably Efficient GPU Algorithms
In this paper we present an abstract model for algorithm design on GPUs by extending the parallel external memory (PEM) model with computations in internal memory (commonly known as shared memory in GPU literature) defined in the presence of memory banks and bank conflicts. We also present a framework for designing bank conflict free algorithms […]
Jun, 23
The 22nd High Performance Computing Symposium, HPC 2014
The 2014 Spring Simulation Multiconference will feature the 22nd High Performance Computing Symposium (HPC 2014), devoted to the impact of high performance computing and communications on computer simulations. Advances in multicore and many-core architectures, networking, high end computers, large data stores, and middleware capabilities are ushering in a new era of high performance parallel and […]
Jun, 23
Workshop on GPU Programming for Molecular Modeling
The GPU Programming for Molecular Modeling workshop will extend GPU programming techniques to the field of molecular modeling, including subjects such as particle-grid algorithms (electrostatics, molecular surfaces, density maps, and molecular orbitals), particle-particle algorithms with an emphasis on non-bonded force calculations, radial distribution functions in GPU histogramming, single-node multi-GPU algorithms, and GPU clusters. Specific examples […]
Jun, 23
Non-Uniformly Partitioned Block Convolution on Graphics Processing Units
Real time convolution has many applications among others simulating room reverberation in audio processing. Non-uniformly partitioning filters could satisfy the both desired features of having a low latency and less computational complexity for an efficient convolution. However, distributing the computation to have an uniform demand on Central Processing Unit (CPU) is still challenging. Moreover, computational […]
Jun, 23
GPU Implementation of the DP code
Main goal of this PRACE project was to evaluate how GPUs could speed up the DP code – a linear response TDDFT code. Profiling analysis of the code has been done to identify computational bottlenecks to be delegated to the GPU. In order to speed up this code using GPUs, two different strategies have been […]
Jun, 22
CUDA Enhanced Simulated Annealing for Chip Layout Problem
This paper introduces an implementation of a parallel solution for the chip layout problem on an NVidia CUDA framework. The experiment allows for varying chip sizes, interconnecting signals, and three chip transformations: rotate, swap, and translate. Total signal distance is minimized as the system converges toward an optimal solution using simulated annealing. Lee’s maze routing […]
Jun, 22
Exploring GPGPUs Workload Characteristics and Power Consumption
While general purpose computing on GPUs continues to enjoy higher computing performance with every new generation. The high power consumption of GPUs is an increasingly important concern. To create power-efficient GPUs, it is important to thoroughly study its power consumption. The power consumption of GPUs varies significantly with workloads. Therefore, in this work we study […]
Jun, 22
Virtualization and Migration with GPGPUs
Recently, cloud computing providers have started to offer virtual machines specifically for high performance computing as a service (HPCaaS). The cloud computing providers usually employ virtualization as an abstraction layer between the application software and the underlying hardware. Virtualization allows flexible migration between physical systems, which is a requirement for many load balancing techniques. In […]