high performance computing on graphics processing units: hgpu.org

Posts

Nov, 9

A molecular docking system using CUDA

A Molecular Docking System enables biologists to check whether two molecular models can be combined at a specific position and remain in their stable states by simulation. It can be used in developing new materials and designing new drugs. Since the docking simulation consists of several complicated computations at the level of atoms, it requires […]

Nov, 9

A framework for simulating and estimating the state and functional topology of complex dynamic geometric networks

We present a framework for simulating signal propagation in geometric networks (i.e. networks that can be mapped to geometric graphs in some space) and for developing algorithms that estimate (i.e. map) the state and functional topology of complex dynamic geometric net- works. Within the framework we define the key features typically present in such networks […]

CUDA

Nov, 9

HONEI: A collection of libraries for numerical computations targeting multiple processor architectures

We present HONEI, an open-source collection of libraries offering a hardware oriented approach to numerical calculations. HONEI abstracts the hardware, and applications written on top of HONEI can be executed on a wide range of computer architectures such as CPUs, GPUs and the Cell processor. We demonstrate the flexibility and performance of our approach with […]

CUDA

Nov, 9

Fast Calculation of the Lomb-Scargle Periodogram Using Graphics Processing Units

I introduce a new code for fast calculation of the Lomb-Scargle periodogram, that leverages the computing power of graphics processing units (GPUs). After establishing a background to the newly emergent field of GPU computing, I discuss the code design and narrate the key parts of the source. Benchmarking calculations indicate no significant differences in accuracy […]

CUDA

Nov, 9

The Chamomile Scheme: An Optimized Algorithm for N-body simulations on Programmable Graphics Processing Units

We present an algorithm named “Chamomile Scheme”. The scheme is fully optimized for calculating gravitational interactions on the latest programmable Graphics Processing Unit (GPU), NVIDIA GeForce8800GTX, which has (a) small but fast shared memories (16 K Bytes * 16) with no broadcasting mechanism and (b) floating point arithmetic hardware of 500 Gflop/s but only for […]

CUDA

Nov, 9

Spherical harmonic transform with GPUs

We describe an algorithm for computing an inverse spherical harmonic transform suitable for graphic processing units (GPU). We use CUDA and base our implementation on a Fortran90 routine included in a publicly available parallel package, S2HAT. We focus our attention on the two major sequential steps involved in the transforms computation, retaining the efficient parallel […]

CUDA

Nov, 9

Nodal Discontinuous Galerkin Methods on Graphics Processors

Discontinuous Galerkin (DG) methods for the numerical solution of partial differential equations have enjoyed considerable success because they are both flexible and robust: They allow arbitrary unstructured geometries and easy control of accuracy without compromising simulation stability. Lately, another property of DG has been growing in importance: The majority of a DG operator is applied […]

CUDA

Nov, 9

SIML: A Fast SIMD Algorithm for Calculating LINGO Chemical Similarities on GPUs and CPUs

LINGOs are a holographic measure of chemical similarity based on text comparison of SMILES strings. We present a new algorithm for calculating LINGO similarities amenable to parallelization on SIMD architectures (such as GPUs and vector units of modern CPUs). We show that it is nearly 3x as fast as existing algorithms on a CPU, and […]

CUDA

Nov, 9

An exploration of CUDA and CBEA for a gravitational wave source-modelling application

In this paper, we accelerate a gravitational physics numerical modelling application using hardware accelerators — Cell processor and Tesla CUDA GPU. We describe these new technologies and our approach in detail, and then present our final performance results. We obtain well over an order-of-magnitude performance gain in our application by making use of these many-core […]

CUDA

Nov, 9

Accelerating Scientific Computations with Mixed Precision Algorithms

On modern architectures, the performance of 32-bit operations is often atleast twice as fast as the performance of 64-bit operations. By using acombination of 32-bit and 64-bit floating point arithmetic, the performance ofmany dense and sparse linear algebra algorithms can be significantly enhancedwhile maintaining the 64-bit accuracy of the resulting solution. The approachpresented here can […]

Nov, 9

Teraflop per second gravitational lensing ray-shooting using graphics processing units

Gravitational lensing calculation using a direct inverse ray-shooting approach is a computationally expensive way to determine magnification maps, caustic patterns, and light-curves (e.g. as a function of source profile and size). However, as an easily parallelisable calculation, gravitational ray-shooting can be accelerated using programmable graphics processing units (GPUs). We present our implementation of inverse ray-shooting […]

CUDA

Nov, 9

High Performance Direct Gravitational N-body Simulations on Graphics Processing Units: An implementation in CUDA (thesis)

At the end of 2006 NVIDIA introduced a new generation of graphical processing units (GPUs) (the so called G80 architecture). These GPUs are more powerful than any of the GPUs released before; they offer up to 350 billion floating-point operations per second (GFLOP/s) in certain situations. With the introduction of this hardware NVIDIA released a […]

CUDA

* * *

high performance computing on graphics processing units: hgpu.org

Posts

A molecular docking system using CUDA

A framework for simulating and estimating the state and functional topology of complex dynamic geometric networks

HONEI: A collection of libraries for numerical computations targeting multiple processor architectures

Fast Calculation of the Lomb-Scargle Periodogram Using Graphics Processing Units

The Chamomile Scheme: An Optimized Algorithm for N-body simulations on Programmable Graphics Processing Units

Spherical harmonic transform with GPUs

Nodal Discontinuous Galerkin Methods on Graphics Processors

SIML: A Fast SIMD Algorithm for Calculating LINGO Chemical Similarities on GPUs and CPUs

An exploration of CUDA and CBEA for a gravitational wave source-modelling application

Accelerating Scientific Computations with Mixed Precision Algorithms

Teraflop per second gravitational lensing ray-shooting using graphics processing units

High Performance Direct Gravitational N-body Simulations on Graphics Processing Units: An implementation in CUDA (thesis)

Recent source codes

XaaS containers

microSYCL: SYCL micro-benchmarks repository

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

PELSI: Power-Efficient Layer-Switched Inference

Ouroboros: Virtualized Queues for dynamic memory management

Most viewed papers (last 30 days)