Posts
Oct, 16
Hard-Sphere Collision Simulations with Multiple GPUs, PCIe Extension Buses and GPU-GPU Communications
Simulating particle collisions is an important application for physics calculations as well as for various effects in computer games and movie animations. Increasing demand for physical correctness and hence visual realism demands higher order time-integration methods and more sophisticated collision management algorithms. We report on the use of singe and multiple Graphical Processing Units (GPUs) […]
Oct, 16
Bit-Packed Damaged Lattice Potts Model Simulations with CUDA and GPUs
Models such as the Ising and Potts systems lend themselves well to simulating the phase transitions that commonly arise in materials science. A particularly interesting variation is when the material being modelled has lattice defects, dislocations or broken bonds and the material experiences a Griffiths phase. The damaged Potts system consists of a set of […]
Oct, 16
Asynchronous Communication for Finite-Difference Simulations on GPU Clusters using CUDA and MPI
Graphical processing Units (GPUs) are finding widespread use as accelerators in computer clusters. It is not yet trivial to program applications that use multiple GPU-enabled cluster nodes efficiently. A key aspect of this is managing effective communication between GPU memory on separate devices on separate nodes. We develop an algorithmic framework for Finite-Difference numerical simulations […]
Oct, 16
High performance finite difference PDE solvers on GPUs
We show how to implement highly efficient GPU solvers for one dimensional PDEs based on finite difference schemes. The typical use case is to price a large number of similar or related derivatives in parallel. Application scenarios include market making, real time pricing, and risk management. The tridiagonal systems in the backward propagation of a […]
Oct, 16
An Auto-tuned Method for Solving Large Tridiagonal Systems on the GPU
We present a multi-stage method for solving large tridiagonal systems on the GPU. Previously large tridiagonal systems cannot be efficiently solved due to the limitation of on-chip shared memory size. We tackle this problem by splitting the systems into smaller ones and then solving them on-chip. The multi-stage characteristic of our method, together with various […]
Oct, 16
GPU-to-CPU callbacks
We present GPU-to-CPU callbacks, a new mechanism and abstraction for GPUs that offers them more independence in a heterogeneous computing environment. Specifically, we provide a method for GPUs to issue callback requests to the CPU. These requests serve as a tool for ease-of-use, future proofing of code, and new functionality. We classify the types of […]
Oct, 16
The Anatomy of High-Performance 2D Similarity Calculations
Similarity measures based on the comparison of dense bit vectors of two-dimensional chemical features are a dominant method in chemical informatics. For large-scale problems, including compound selection and machine learning, computing the intersection between two dense bit vectors is the overwhelming bottleneck. We describe efficient implementations of this primitive as well as example applications using […]
Oct, 16
Data Layout Pruning on GPU
This work is based on NVIDIA GTX 280 using CUDA (Computing Unified Device Architecture). We classify Dataset to be transferred into CUDA memory hierarchy into SW (shared and must write) and SR (shared but only read), and existing memory spaces (including shared memory, constant memory, texture memory and global memory) supported on CUDA-enabled GPU memory […]
Oct, 16
Skeleton Programming for Heterogeneous GPU-based Systems
In this thesis, we address issues associated with programming modern heterogeneous systems while focusing on a special kind of heterogeneous systems that include multicore CPUs and one or more GPUs, called GPU-based systems.We consider the skeleton programming approach to achieve high level abstraction for efficient and portable programming of these GPU-based systemsand present our work […]
Oct, 16
Scaling-up spatially-explicit ecological models using graphics processors
How the properties of ecosystems relate to spatial scale is a prominent topic in current ecosystem research. Despite this, spatially explicit models typically include only a limited range of spatial scales, mostly because of computing limitations. Here, we describe the use of graphics processors to efficiently solve spatially explicit ecological models at large spatial scale […]
Oct, 15
Physis: An Implicitly Parallel Programming Model for Stencil Computations on Large-Scale GPU-Accelerated Supercomputers
This paper proposes a compiler-based programming framework that automatically translates user-written structured grid code into scalable parallel implementation code for GPU-equipped clusters. To enable such automatic translations, we design a small set of declarative constructs that allow the user to express stencil computations in a portable and implicitly parallel manner. Our framework translates the user-written […]
Oct, 15
Operating Systems Challenges for GPU Resource Management
The graphics processing unit (GPU) is becoming a very powerful platform to accelerate graphics and data-parallel compute-intensive applications. It significantly outperforms traditional multi-core processors in performance and energy efficiency. Its application domains also range widely from embedded systems to high-performance computing systems. However, operating systems support is not adequate, lacking models, designs, and implementation efforts […]