high performance computing on graphics processing units: hgpu.org

Posts

Dec, 29

Accelerating NBODY6 with Graphics Processing Units

We describe the use of Graphics Processing Units (GPUs) for speeding up the code NBODY 6 which is widely used for direct N-body simulations. Over the years, the N^2 nature of the direct force calculation has proved a barrier for extending the particle number. Following an early introduction of force polynomials and individual time-steps, the […]

CUDA

Dec, 28

Improving the speed of neural networks on CPUs

Recent advances in deep learning have made the use of large, deep neural networks with tens of millions of parameters suitable for a number of applications that require real-time processing. The sheer size of these networks can represent a challenging computational burden, even for modern CPUs. For this reason, GPUs are routinely used instead to […]

CUDA

Dec, 28

Multilevel Tile Load Map on Massive Terrain Visualization

This paper analyzed the efficient architecture features of massive terrain LOD visualization, and found that CPU can hardly select tiles from mass terrain effectively. This restricted the expansion of terrain’s size. Yacine Amara presented Tile Load Map(TLM). This paper presented Multilevel Tile Load Map (MTLM) algorithm for tile selection to extend TLM. MTLM uses 2d […]

OpenGL

Dec, 28

Speeding Up Particle Trajectory Simulations under Moving Force Fields using GPUs

In this paper, we introduce a GPU-based framework for simulating particle trajectories under both static and dynamic force fields. By exploiting the highly parallel nature of the problem and making efficient use of the available hardware, our simulator exhibits a significant speedup over its CPU-based analog. We apply our framework to a specific experimental simulation: […]

CUDA

Dec, 28

BOPM implemented on a GPU-architecture

We used the Binomial Options Pricing Model (BOPM) implemented on a Graphics Processing Unit (GPU) to calculate the value of European and American options, of both put and call type. The advantage of using a GPU over a CPU is that a GPU has many more processing-cores than a CPU and can perform more calculations […]

CUDA

Dec, 28

GPU-Based Global Illumination Using Lightcuts

Global Illumination aims to generate high quality images. But due to its high requirements, it is usually quite slow. Research documented in this thesis was intended to offer a hardware and software combined acceleration solution to global illumination. The GPU (using CUDA) was the hardware part of the whole method that applied parallelism to increase […]

CUDA

Dec, 28

A Fast Algorithm for Constructing Inverted Files on Heterogeneous Platforms

Given a collection of documents residing on a disk, we develop a new strategy for processing these documents and building the inverted files extremely fast. Our approach is tailored for a heterogeneous platform consisting of a multicore CPU and a highly multithreaded GPU. Our algorithm is based on a number of novel techniques including: (i) […]

CUDA

Dec, 28

Monitoring Multiple Streams with Dynamic Time Warping using Graphic Processors

In this paper, we present an approach for efficiently monitoring multiple data streams using graphic processor units (GPUs). Given reference patterns, similar subsequences in streams are matched under the dynamic time warping (DTW) distance and reported continuously. DTW distance is adopted since it offers scaling and shifting exibility in the time axis. However, it suffers […]

CUDA

Dec, 28

Precomputed compressive sensing for light transport acquisition

In this article, we propose an efficient and accurate compressive-sensing-based method for estimating the light transport characteristics of real-world scenes. Although compressive sensing allows the efficient estimation of a high-dimensional signal with a sparse or near-to-sparse representation from a small number of samples, the computational cost of the compressive sensing in estimating the light transport […]

CUDA

Dec, 28

Efficient parallel lists intersection and index compression algorithms using graphics processing units

Major web search engines answer thousands of queries per second requesting information about billions of web pages. The data sizes and query loads are growing at an exponential rate. To manage the heavy workload, we consider techniques for utilizing a Graphics Processing Unit (GPU). We investigate new approaches to improve two important operations of search […]

CUDA

Dec, 28

Real-Time Rendering of Temporal Volumetric Data on a GPU

Real-time rendering of static volumetric data is generally known to be a memory and computationally intensive process. With the advance of graphic hardware, especially GPU, it is now possible to do this using desktop computers. However, with the evolution of real-time CT and MRI technologies, volumetric rendering is an even bigger challenge. The first one […]

Dec, 27

PFAC Library: GPU-based string matching algorithm

The PFAC algorithm efficiently exploits the parallelism of the Aho-Corasick algorithm by creating an individual thread for each byte of an input stream to identify any pattern starting at the thread’s starting position. The number of threads created by the PFAC algorithm is equal to the length of an input stream.

CUDA

high performance computing on graphics processing units: hgpu.org

Posts

Accelerating NBODY6 with Graphics Processing Units

Improving the speed of neural networks on CPUs

Multilevel Tile Load Map on Massive Terrain Visualization

Speeding Up Particle Trajectory Simulations under Moving Force Fields using GPUs

BOPM implemented on a GPU-architecture

GPU-Based Global Illumination Using Lightcuts

A Fast Algorithm for Constructing Inverted Files on Heterogeneous Platforms

Monitoring Multiple Streams with Dynamic Time Warping using Graphic Processors

Precomputed compressive sensing for light transport acquisition

Efficient parallel lists intersection and index compression algorithms using graphics processing units

Real-Time Rendering of Temporal Volumetric Data on a GPU

PFAC Library: GPU-based string matching algorithm

Recent source codes

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)