high performance computing on graphics processing units: hgpu.org

Posts

Dec, 30

Faster Dark Matter Calculations Using the GPU

We have investigated the use of the graphical processing unit to accelerate the software package DarkSUSY. DarkSUSY is, among other things, used for calculating the dark matter relic density — an measurable quantity — given the supersymmetric neutralino, tilde{Chi}, as a dark matter candidate. Supersymmetric theories have many free parameters and we want to calculate […]

CUDA

Dec, 30

Building Human Brain Network in 3D Coefficient Map Determined by X-ray Microtomography

X-ray microtomography can visualize 3D structures of biological soft tissues at cellular to subcellular resolution. Such 3D structures are composed of a great number of cells and extracellular matrices that should be assigned separately as tissue constituents. Here, we report a method for building a skeletonized model of the human brain network in a 3D […]

CUDA

Dec, 30

Deep Shadow Maps from Volumetric Data on the GPU

A method of generating Deep Shadow Maps from a 3D data set is presented. This method uses ray tracing on the GPU to accumulate opacity and store them in a deep shadow map. The deep shadow map is then sampled based on view direction to determine how much light got to a particular fragment. The […]

Dec, 29

Resource-Aware Compiler Prefetching for Fine-Grained Many-Cores

Super-scalar, out-of-order processors that can have tens of read and write requests in the execution window place significant demands on Memory Level Parallelism (MLP). Multi- and many-cores with shared parallel caches further increase MLP demand. Current cache hierarchies however have been unable to keep up with this trend, with modern designs allowing only 4-16 concurrent […]

Dec, 29

Acceleration of PIC Simulation with GPU

Particle-in-cell (PIC) is a simulation technique for plasma physics. The large number of particles in highresolution plasma simulation increases the volume computation required, making it vital to increase computation speed. In this study, we attempt to accelerate computation speed on graphics processing units (GPUs) using KEMPO, a PIC simulation code package [H. Matsumoto and Y. […]

CUDA

Dec, 29

Scalable Fast Multipole Methods on Distributed Heterogeneous Architectures

We fundamentally reconsider implementation of the Fast Multipole Method (FMM) on a computing node with a heterogeneous CPU-GPU architecture with multicore CPU(s) and one or more GPU accelerators, as well as on an interconnected cluster of such nodes. The FMM is a divide-and-conquer algorithm that performs a fast N-body sum using a spatial decomposition and […]

CUDA

Dec, 29

Parallelism, Patterns, and Performance in Iterative MRI Reconstruction

Magnetic Resonance Imaging (MRI) is a non-invasive and highly exible medical imaging modality that does not expose patients ionizing radiation. MR Image acquisitions can be designed by varying a large number of contrast-generation parameters, and many clinical diagnostic applications exist. However, imaging speed is a fundamental limitation to many potential applications. Traditionally, MRI data have […]

CUDA

Dec, 29

Visualization assisted by parallel processing

This paper discusses the experimental results of our visualization model for data extracted from sensors. The objective of this paper is to find a computationally efficient method to produce a real time rendering visualization for a large amount of data. We develop visualization method to monitor temperature variance of a data center. Sensors are placed […]

OpenCL

Dec, 29

Engineering Concurrent Software Guided by Statistical Performance Analysis

This paper introduces the ADVANCE approach to engineering concurrent systems using a new component-based approach. A cost-directed tool-chain maps concurrent programs onto emerging hardware architectures, where costs are expressed in terms of programmer annotations for the throughput, latency and jitter of components. These are then synthesized using advanced statistical analysis techniques to give overall cost […]

CUDA

Dec, 29

Parallel computing system for the efficient calculation of molecular similarity based on negative electrostatic potential

This document proposes an alternative method for the comparison of molecular electrostatic potential (MEP), based on parallel computing algorithms on graphics cards using NVIDIA CUDA platform and kernel methods for pattern recognition. The proposed solution optimizes the construction process of a particular representation of MEP, presents options for improving this representation, and offers 11 kernel […]

CUDA

Dec, 29

Parallel Quadtree Coding of Large-Scale Raster Geospatial Data on Multicore CPUs and GPGPUs

Global remote sensing and large-scale environmental modeling have generated huge amounts of raster geospatial data. While the inherent data parallelism of large-scale raster geospatial data allows straightforward coarse-grained parallelization at the chunk level on CPUs, it is largely unclear how to effectively exploit such data parallelism on massively parallel General Purpose Graphics Processing Units (GPGPUs) […]

CUDA

Dec, 29

Accelerating NBODY6 with Graphics Processing Units

We describe the use of Graphics Processing Units (GPUs) for speeding up the code NBODY 6 which is widely used for direct N-body simulations. Over the years, the N^2 nature of the direct force calculation has proved a barrier for extending the particle number. Following an early introduction of force polynomials and individual time-steps, the […]

CUDA

high performance computing on graphics processing units: hgpu.org

Posts

Faster Dark Matter Calculations Using the GPU

Building Human Brain Network in 3D Coefficient Map Determined by X-ray Microtomography

Deep Shadow Maps from Volumetric Data on the GPU

Resource-Aware Compiler Prefetching for Fine-Grained Many-Cores

Acceleration of PIC Simulation with GPU

Scalable Fast Multipole Methods on Distributed Heterogeneous Architectures

Parallelism, Patterns, and Performance in Iterative MRI Reconstruction

Visualization assisted by parallel processing

Engineering Concurrent Software Guided by Statistical Performance Analysis

Parallel computing system for the efficient calculation of molecular similarity based on negative electrostatic potential

Parallel Quadtree Coding of Large-Scale Raster Geospatial Data on Multicore CPUs and GPGPUs

Accelerating NBODY6 with Graphics Processing Units

Recent source codes

OpScanner

Atlas CLI: Machine Learning (ML) Lifecycle & Transparency Manager

transformers_tvm: Implementation of Encoder Decoder transformer on TVM

INT v.s. FP: A framework to compare low-bit integer and float-point formats

AutoDock-GPU: AutoDock for GPUs and other accelerators

NCCLX: collective communication framework

Tutoring LLM into a Better CUDA Optimizer

Adaptivity in AdaptiveCpp: Optimizing Performance by Leveraging Runtime Information During JIT-Compilation

Kernel Library for LLM Serving

Neptune: Advanced ML Operator Fusion for Locality and Parallelism on GPUs

Most viewed papers (last 30 days)