high performance computing on graphics processing units: hgpu.org

Posts

Nov, 1

Quantum chemical many-body theory on heterogeneous nodes

he iterative solution of the coupled-cluster with single and double excitations (CCSD) equations is a very time-consuming component of the "gold standard" in quantum chemistry, the CCSD(T) method. In an effort to accelerate accurate quantum mechanical calculations, we explore two implementation strategies for the iterative solution of the CC equations on graphics procesing units (GPUs). […]

CUDA

Nov, 1

An Implementation of Differential Evolution for Independent Tasks Scheduling on GPU

Differential evolution is an efficient meta-heuristic optimization method with solid record of real world applications. In this paper, we present a simple and efficient implementation of the differential evolution using the massively parallel CUDA architecture. We demonstrate the speedup and improvements obtained by the parallelization of this intelligent algorithm on the problem of scheduling of […]

CUDA

Nov, 1

A Real-Time Computer Vision Library for Heterogeneous Processing Environments

With a variety of processing technologies available today, using a combination of different technologies often provides the best performance for a particular task. However, unifying multiple processors with different instruction sets can be a very ad hoc and difficult process. The Open Component Portability Infrastructure (OpenCPI) provides a platform that simplifies programming heterogeneous processing applications […]

Nov, 1

Fast computation of scattering maps of nanostructures using graphical processing units

Scattering maps from strained or disordered nanostructures around a Bragg reflection can be either computed quickly using approximations and a (fast) Fourier transform or obtained using individual atomic positions. In this article, it is shown that it is possible to compute up to 4*10^10 reflections*atoms*s^-1 using a single graphics card, and the manner in which […]

CUDA

Nov, 1

Architecture Comparisons between Nvidia and ATI GPUs: Computation Parallelism and Data Communications

In recent years, modern graphics processing units have been widely adopted in high performance computing areas to solve large scale computation problems. The leading GPU manufacturers Nvidia and ATI have introduced series of products to the market. While sharing many similar design concepts, GPUs from these two manufacturers differ in several aspects on processor cores […]

OpenCL

Nov, 1

Extremely large scale simulation of a Kardar-Parisi-Zhang model using graphics cards

The octahedron model introduced recently has been implemented onto graphics cards, which permits extremely large scale simulations via binary lattice gases and bit coded algorithms. We confirm scaling behavior belonging to the 2d Kardar-Parisi-Zhang universality class and find a surface growth exponent: beta=0.2415(15) on 2^17 x 2^17 systems, ruling out beta=1/4 suggested by field theory. […]

CUDA

Oct, 31

Parallel Smoothers for Matrix-based Multigrid Methods on Unstructured Meshes Using Multicore CPUs and GPUs

Multigrid methods are efficient and fast solvers for problems typically modeled by partial differential equations of elliptic type. For problems with complex geometries and local singularities stencil-type discrete operators on equidistant Cartesian grids need to be replaced by more flexible concepts for unstructured meshes in order to properly resolve all problem-inherent specifics and for maintaining […]

CUDA

Oct, 31

Optimal Control of the Process Systems Using Graphic Processing Unit

In this paper the Graphic Processing Unit (GPU) is applied in order to improve the computational performance of process systems optimal control calculations. To apply GPU massive parallel architecture, a simplified version of interior point optimisation algorithm was selected and modified to fulfil special hardware requirements of GPU architecture. In this algorithm, a damped nonlinear […]

CUDA

Oct, 31

SIMD Re-Convergence At Thread Frontiers

Hardware and compiler techniques for mapping data-parallel programs with divergent control flow to SIMD architectures have recently enabled the emergence of new GPGPU programming models such as CUDA, OpenCL, and DirectX Compute. The impact of branch divergence can be quite different depending upon whether the program’s control flow is structured or unstructured. In this paper, […]

CUDA

Oct, 31

Rapid Performance of a Generalized Distance Calculation

The ever-increasing size of data sets and the need for real-time processing drives the need for high speed analysis. Since traditional CPUs are designed to execute a small number of sequential process, they are ill-suited to keep pace with this growth and exploit the massive parallelism inherent in these problem spaces. In the last several […]

CUDA

Oct, 31

Low Latency Complex Event Processing on Parallel Hardware

Several application domains involve observing events, processing them, and reacting. This asks for a Complex Event Processing (CEP) engine in charge of interpreting, filtering, and combining primitive events that occur in the external environment, to identify higher level composite events, according to a set of rules written in an ad-hoc rule definition language. A key […]

CUDA

Oct, 31

Fast Speaker Diarization Using a High-Level Scripting Language

Most current speaker diarization systems use agglomerative clustering of Gaussian Mixture Models (GMMs) to determine "who spoke when" in an audio recording. While stateof-the-art in accuracy, this method is computationally costly, mostly due to the GMM training, and thus limits the performance of current approaches to be roughly real-time. Increased sizes of current datasets require […]

CUDA

high performance computing on graphics processing units: hgpu.org

Posts

Quantum chemical many-body theory on heterogeneous nodes

An Implementation of Differential Evolution for Independent Tasks Scheduling on GPU

A Real-Time Computer Vision Library for Heterogeneous Processing Environments

Fast computation of scattering maps of nanostructures using graphical processing units

Architecture Comparisons between Nvidia and ATI GPUs: Computation Parallelism and Data Communications

Extremely large scale simulation of a Kardar-Parisi-Zhang model using graphics cards

Parallel Smoothers for Matrix-based Multigrid Methods on Unstructured Meshes Using Multicore CPUs and GPUs

Optimal Control of the Process Systems Using Graphic Processing Unit

SIMD Re-Convergence At Thread Frontiers

Rapid Performance of a Generalized Distance Calculation

Low Latency Complex Event Processing on Parallel Hardware

Fast Speaker Diarization Using a High-Level Scripting Language

Recent source codes

OpScanner

Atlas CLI: Machine Learning (ML) Lifecycle & Transparency Manager

transformers_tvm: Implementation of Encoder Decoder transformer on TVM

INT v.s. FP: A framework to compare low-bit integer and float-point formats

AutoDock-GPU: AutoDock for GPUs and other accelerators

NCCLX: collective communication framework

Tutoring LLM into a Better CUDA Optimizer

Adaptivity in AdaptiveCpp: Optimizing Performance by Leveraging Runtime Information During JIT-Compilation

Kernel Library for LLM Serving

Neptune: Advanced ML Operator Fusion for Locality and Parallelism on GPUs

Most viewed papers (last 30 days)