6036

Posts

Oct, 17

Programming with Explicit Dependencies. A Framework for Portable Parallel Programming

Computational devices are rapidly evolving into massively parallel systems. Multicore processors are already standard; high performance processors such as the Cell/BE processor, graphics processing units (GPUs) featuring hundreds of on-chip processors, and reconfigurable devices such as FPGAs are all developed to deliver high computing power. They make parallelism commonplace, not only the privilege of expensive […]
Oct, 17

A High Performance Parallel Sparse Linear Equation Solver Using CUDA

The management of electric power systems requires continuously computing the powerflow of a power system in real-time. For large power systems, this task is often beyond the capabilities of modern CPUs. Concurrent computation is an attractive approach to accelerating it. However, the powerflow computation requires solving a large system of sparse linear equations. This problem […]
Oct, 16

Hard-Sphere Collision Simulations with Multiple GPUs, PCIe Extension Buses and GPU-GPU Communications

Simulating particle collisions is an important application for physics calculations as well as for various effects in computer games and movie animations. Increasing demand for physical correctness and hence visual realism demands higher order time-integration methods and more sophisticated collision management algorithms. We report on the use of singe and multiple Graphical Processing Units (GPUs) […]
Oct, 16

Bit-Packed Damaged Lattice Potts Model Simulations with CUDA and GPUs

Models such as the Ising and Potts systems lend themselves well to simulating the phase transitions that commonly arise in materials science. A particularly interesting variation is when the material being modelled has lattice defects, dislocations or broken bonds and the material experiences a Griffiths phase. The damaged Potts system consists of a set of […]
Oct, 16

Asynchronous Communication for Finite-Difference Simulations on GPU Clusters using CUDA and MPI

Graphical processing Units (GPUs) are finding widespread use as accelerators in computer clusters. It is not yet trivial to program applications that use multiple GPU-enabled cluster nodes efficiently. A key aspect of this is managing effective communication between GPU memory on separate devices on separate nodes. We develop an algorithmic framework for Finite-Difference numerical simulations […]
Oct, 16

High performance finite difference PDE solvers on GPUs

We show how to implement highly efficient GPU solvers for one dimensional PDEs based on finite difference schemes. The typical use case is to price a large number of similar or related derivatives in parallel. Application scenarios include market making, real time pricing, and risk management. The tridiagonal systems in the backward propagation of a […]
Oct, 16

An Auto-tuned Method for Solving Large Tridiagonal Systems on the GPU

We present a multi-stage method for solving large tridiagonal systems on the GPU. Previously large tridiagonal systems cannot be efficiently solved due to the limitation of on-chip shared memory size. We tackle this problem by splitting the systems into smaller ones and then solving them on-chip. The multi-stage characteristic of our method, together with various […]
Oct, 16

GPU-to-CPU callbacks

We present GPU-to-CPU callbacks, a new mechanism and abstraction for GPUs that offers them more independence in a heterogeneous computing environment. Specifically, we provide a method for GPUs to issue callback requests to the CPU. These requests serve as a tool for ease-of-use, future proofing of code, and new functionality. We classify the types of […]
Oct, 16

The Anatomy of High-Performance 2D Similarity Calculations

Similarity measures based on the comparison of dense bit vectors of two-dimensional chemical features are a dominant method in chemical informatics. For large-scale problems, including compound selection and machine learning, computing the intersection between two dense bit vectors is the overwhelming bottleneck. We describe efficient implementations of this primitive as well as example applications using […]
Oct, 16

Data Layout Pruning on GPU

This work is based on NVIDIA GTX 280 using CUDA (Computing Unified Device Architecture). We classify Dataset to be transferred into CUDA memory hierarchy into SW (shared and must write) and SR (shared but only read), and existing memory spaces (including shared memory, constant memory, texture memory and global memory) supported on CUDA-enabled GPU memory […]
Oct, 16

Skeleton Programming for Heterogeneous GPU-based Systems

In this thesis, we address issues associated with programming modern heterogeneous systems while focusing on a special kind of heterogeneous systems that include multicore CPUs and one or more GPUs, called GPU-based systems.We consider the skeleton programming approach to achieve high level abstraction for efficient and portable programming of these GPU-based systemsand present our work […]
Oct, 16

Scaling-up spatially-explicit ecological models using graphics processors

How the properties of ecosystems relate to spatial scale is a prominent topic in current ecosystem research. Despite this, spatially explicit models typically include only a limited range of spatial scales, mostly because of computing limitations. Here, we describe the use of graphics processors to efficiently solve spatially explicit ecological models at large spatial scale […]

Recent source codes

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us: