5976

Posts

Oct, 14

Accelerating Large Scale Image Analyses on Parallel CPU-GPU Equipped Systems

General-purpose graphical processing units (GPGPUs) have transformed high-performance computing over the past decade. Making great computational power available with reduced cost and power consumption overheads, heterogeneous CPU-GPU-equipped systems have helped to make possible the emerging class of exascale data-intensive applications. Although the theoretical performance achieved by these hybrid systems is impressive, taking practical advantage of […]
Oct, 14

CudaDMA: Optimizing GPU Memory Bandwidth via Warp Specialization

As the computational power of GPUs continues to scale with Moore’s Law, an increasing number of applications are becoming limited by memory bandwidth. We propose an approach for programming GPUs with tightly-coupled specialized DMA warps for performing memory transfers between on-chip and off-chip memories. Separate DMA warps improve memory bandwidth utilization by better exploiting available […]
Oct, 14

OptiML: An implicitly parallel domain-specific language for machine learning

As the size of datasets continues to grow, machine learning applications are becoming increasingly limited by the amount of available computational power. Taking advantage of modern hardware requires using multiple parallel programming models targeted at different devices (e.g. CPUs and GPUs). However, programming these devices to run efficiently and correctly is difficult, error-prone, and results […]
Oct, 14

Liszt: A Domain Specific Language for Building Portable Mesh-based PDE Solvers

Heterogeneous computers with processors and accelerators are becoming widespread in scientific computing. However, it is difficult to program hybrid architectures and there is no commonly accepted programming model. Ideally, applications should be written in a way that is portable to many platforms, but providing this portability for general programs is a hard problem. By restricting […]
Oct, 14

GPU Computing Gems: Jade Edition

This is the second volume of Morgan Kaufmann’s GPU Computing Gems, offering an all-new set of insights, ideas, and practical ";hands-on"; skills from researchers and developers worldwide. Each chapter gives you a window into the work being performed across a variety of application domains, and the opportunity to witness the impact of parallel GPU computing […]
Oct, 14

Towards scalar synchronization in SIMT architectures

An important class of compute accelerators are graphics processing units (GPUs). Popular programming models for non-graphics computation on GPUs, such as CUDA and OpenCL, provide an abstraction of many parallel scalar threads. Contemporary GPU hardware groups 32 to 64 scalar threads as a single warp or wavefront and executes this group of scalar threads in […]
Oct, 14

A Heterogeneous Parallel Framework for Domain-Specific Languages

Computing systems are becoming increasingly parallel and heterogeneous, and therefore new applications must be capable of exploiting parallelism in order to continue achieving high performance. However, targeting these emerging devices often requires using multiple disparate programming models and making decisions that can limit forward scalability. In previous work we proposed the use of domain-specific languages […]
Oct, 14

Fast Multipole Method vs. Spectral Method for the Simulation of Isotropic Turbulence on GPUs

This paper presents calculations of homogeneous isotropic turbulence at Re_{lambda} = 100 using both a pseudo-spectral method and a fast multipole vortex method on a 256^3 grid. For the vortex method, both algorithmic and hardware acceleration are applied using a highly parallel fast multipole method (FMM) on GPUs. The spectral methods uses the FFTW library […]
Oct, 13

Benchmarking Across Platforms: European Option Pricing

Using a popular Monte Carlo workload which implements European option pricing, we tested a variety of architectures including NVIDIA and AMD GPUs, ClearSpeed accelerator and multi-core processors and different programming approaches. We conclude that this particular workload seems most suitable for running on GPU type of architecture compared to other alternatives such as CPU or […]
Oct, 13

Firepile: Run-time Compilation for GPUs in Scala

Recent advances have enabled GPUs to be used as general-purpose parallel processors on commodity hardware for little cost. However, the ability to program these devices has not kept up with their performance. The programming model for GPUs has a number of restrictions that make it dif?cult to program. For example, software running on the GPU […]
Oct, 13

A rendering method for simulated emission nebulae

Emission nebulae are some of the most beautiful stellar phenomena. The newly formed hot stars inside the nebulae ionize the surrounding gas making it glow in variety of colors. The focus of this work is to find a method for interactive rendering of simulated emission nebulae. A rendering program has been developed to render and […]
Oct, 13

Introduction to GPU Radix Sort

Radix sort is one of the fastest sorting algorithms. It is fast especially for a large problem size. Radix sort is not a comparison sort but a counting sort. When we sort n bit keys, 2^n counters are prepared for each number.

Recent source codes

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us: