high performance computing on graphics processing units: hgpu.org

Posts

Feb, 24

Dense Matrix Algebra on the GPU

Perhaps the most important innovation of the latest generation of programmable graphics processors (GPUs) is their capability to work with floating point color data. Previous generations of GPUs have worked with up to a byte of integer data per color channel. Developers working on graphics engines with advanced lighting effects often complained about banding artifacts, […]

Feb, 23

GPU-Assisted Malware

Malware writers constantly seek new methods to obfuscate their code so as to evade detection by virus scanners. Two code-armoring techniques that pose significant challenges to existing malicious-code detection and analysis systems are unpacking and run-time polymorphism. In this paper, we demonstrate how malware can increase its robustness against detection by taking advantage of the […]

CUDA

Feb, 23

GPU Coprocessing for Wireless Network Simulation

Site-specific modeling of wireless communications channels has historically been too computationally intensive to incorporate into commodity network simulators. Simulation cannot accurately predict the behavior of wireless networks in real-world environments without modeling the physical channel realistically. Realistic models typically involve large amounts of floating point computation, to which modern GPUs are well suited. In this […]

CUDA

Feb, 23

Architectural Comparisons for a Quantum Monte Carlo Application

Recent technological advances have led to a number of emerging platforms such as multi-cores, reconfigurable computing, and graphics processing units. We present a comparative study of multi-cores, field-programmable gate arrays, and graphics processing units for a Quantum Monte Carlo chemistry application. The speedups of these implementations are measured relative to a multi-core implementation and the […]

CUDA

Feb, 23

Flexible Hardware Mapping for Finite Element Simulations on Hybrid CPU / GPU Clusters

The ever increasing peak floating-point performance and memory bandwidth of GPUs is making them increasingly ubiquitous in the high performance computing community. With increasing adoption of GPUs in cluster environments, applications that cannot take advantage of this hardware will be at a distinct disadvantage. For the class of applications that can achieve massive speedups of […]

CUDA

Feb, 23

Probing biomolecular machines with graphics processors

GPU acceleration and other computer performance increases will offer critical benefits to biomedical science. Computer simulation has become an integral part of the study of the structure and function of biological molecules. For years, parallel computers have been used to conduct these computationally demanding simulations and to analyze their results. These simulations function as a […]

CUDA

•

OpenCL

Feb, 22

GPU Acceleration of the Generalized Interpolation Material Point Method

This paper describes our experience rewriting a sequential particle-in-cell code so that its key computations are executed on a GPU. This code is well-suited to GPU acceleration, as it performs data-parallel operations on a regular grid. Key performance challenges are the need for global synchronization in mapping particles to grid nodes, and managing memory bandwidth […]

CUDA

Feb, 22

Accelerating Energy Minimization using Graphics Processors

Energy minimization is an important step in molecular modeling, with applications in molecular docking and in mapping binding sites. Minimization involves repeated evaluation of various bonded and non-bonded energies of a protein complex. It is a computationally expensive process, with runtimes typically being many hours on a desktop system. In the current article, we present […]

CUDA

Feb, 22

Accelerating the ANSYS Direct Sparse Solver with GPUs

As hardware accelerators and especially GPUs become more and more popular to accelerate the compute intensive parts of an algorithm, standard high performance computing packages are starting to benefit from this trend. We present the first GPU acceleration of the ANSYS direct sparse solver. We explain how such a multifrontal solver may be accelerated using […]

CUDA

Feb, 22

Production Floating Point Applications on FPGAs

While FPGAs have only one fifth the raw floating point capability of GPUs, other attributes allow them to be surprisingly competitive with respect to a number of critical floating point intensive applications. In the first part we review these FPGA attributes. The bulk of this extended abstract then provides an overview of efficient FPGA implementations […]

CUDA

Feb, 22

Real-Time Stereo on GPGPU using Progressive Multi-Resolution Adaptive Windows

We introduce a new GPGPU-based real-time dense stereo matching algorithm. The algorithm is based on a progressive multi-resolution pipeline which includes background modeling and dense matching with adaptive windows. For applications in which only moving objects are of interest, this approach effectively reduces the overall computation cost quite significantly, and preserves the high definition details. […]

Feb, 22

Energy efficiency of mixed precision iterative refinement methods using hybrid hardware platforms

In this paper we evaluate the possibility of using mixed precision algorithms on different hardware platforms to obtain energy-efficient solvers for linear systems of equations. Our test-cases arise in the context of computational fluid dynamics. Therefore, we analyze the energy efficiency of common cluster nodes and a hybrid, GPU-accelerated cluster node, when applying a linear […]

high performance computing on graphics processing units: hgpu.org

Posts

Dense Matrix Algebra on the GPU

GPU-Assisted Malware

GPU Coprocessing for Wireless Network Simulation

Architectural Comparisons for a Quantum Monte Carlo Application

Flexible Hardware Mapping for Finite Element Simulations on Hybrid CPU / GPU Clusters

Probing biomolecular machines with graphics processors

GPU Acceleration of the Generalized Interpolation Material Point Method

Accelerating Energy Minimization using Graphics Processors

Accelerating the ANSYS Direct Sparse Solver with GPUs

Production Floating Point Applications on FPGAs

Real-Time Stereo on GPGPU using Progressive Multi-Resolution Adaptive Windows

Energy efficiency of mixed precision iterative refinement methods using hybrid hardware platforms

Recent source codes

CudaForge: An Agent Framework with Hardware Feedback for CUDA Kernel Optimization

LC Framework

pplx-garden: Perplexity open source garden for inference technology

Atlas CLI: Machine Learning (ML) Lifecycle & Transparency Manager

transformers_tvm: Implementation of Encoder Decoder transformer on TVM

OpScanner

INT v.s. FP: A framework to compare low-bit integer and float-point formats

AutoDock-GPU: AutoDock for GPUs and other accelerators

NCCLX: collective communication framework

Tutoring LLM into a Better CUDA Optimizer

Most viewed papers (last 30 days)