high performance computing on graphics processing units: hgpu.org

Posts

Nov, 14

Load Balanced Parallel GPU Out-of-Core for Continuous LOD Model Visualization

Rendering massive 3D models has been recognized as a challenging task. Due to the limited size of GPU memory, a massive model containing hundreds of millions of primitives cannot fit into most of modern GPUs. By applying parallel levelof-detail (LOD), as proposed in [1], only a portion of primitives instead of the whole are necessary […]

CUDA

•

OpenGL

Nov, 14

G-SNPM – A GPU-based SNP mapping tool

MOTIVATION AND OBJECTIVES: In genotyping analysis often researchers need to merge together genetic datasets coming from different genotyping platforms that use different sets of Single Nucleotide Polymorphisms (SNPs) to represent genetic polymorphisms. In order to do this, it is necessary to know the exact position of a SNP in a chromosome and update this information […]

CUDA

Nov, 14

Performance modeling of atomic additions on GPU scratchpad memory

GPU application implementations using scatter approaches will fall into write contention due to atomic updates of output elements, if these result from more than one input element. Colliding threads will be serialized, seriously harming performance. Dealing with these issues requires a proper understanding of the behavior of the scratchpad or shared memory under conflicting accesses […]

CUDA

Nov, 14

A simple method to accelerate fringe analysis algorithms based on graphics processing unit and MATLAB

With the fast development during the past few years, multicore has become a revolutionary technique for the performance improvement of computing devices, ranging from supercomputers to cell phones. Among multicore processors, a graphics processing units (GPU) is outstanding because of its huge computational performance and comparably low cost. It can be used as a coprocessor […]

CUDA

Nov, 14

Correctly rounding elementary functions on GPU

The IEEE 754-2008 standard recommends the correct rounding of elementary functions. This requires to solve the Table Maker’s Dilemma which implies a huge amount of CPU computation time. We consider in this paper accelerating such computations, namely Lef’evre algorithm, on Graphics Processing Units (GPU) which are massively parallel architectures with a partial SIMD execution (Single […]

CUDA

Nov, 14

Efficient similarity search on multimedia databases

Manipulating and retrieving multimedia data has received increasing attention with the advent of cloud storage facilities. The ability of querying by similarity over large data collections is mandatory to improve storage and user interfaces. But, all of them are expensive operations to solve only in CPU; thus, it is convenient to take into account High […]

CUDA

Nov, 14

An architecture for real time fluid simulation using multiple GPUs

Natural phenomena simulation, such as water and smoke, is a very important topic in order to increase real time scene realism in videogames and general real time simulations. However, this kind of simulation requires numerically solving the Navier-Stokes equations, which is a computationally expensive task. Additionally, to deal more immersing simulation, interaction between the flow […]

CUDA

Nov, 14

Real-Time Scheduling Using GPUs – Advanced and More Accurate Proof of Feasibility

This paper will report our evaluation to use OpenCL as a platform for hard real-time scheduling. Especially, we have evaluated which types of tasks are faster on GPGPU than on CPU. We have investigated computational tasks, memory intensive tasks (especially tasks using low latency GDDR memory) and disk intensive tasks. This study is the part […]

OpenCL

Nov, 14

Kernel Weaver: Automatically Fusing Database Primitives for Efficient GPU Computation

Data warehousing applications represent an emerging application arena that requires the processing of relational queries and computations over massive amounts of data. Modern general purpose GPUs are high bandwidth architectures that potentially offer substantial improvements in throughput for these applications. However, there are significant challenges that arise due to the overheads of data movement through […]

CUDA

Nov, 14

Real-Time Surface Extraction and Visualization of Medical Images using OpenCL and GPUs

Marching Cubes (MC) is an algorithm that extracts surfaces from volumetric scalar data. It is used extensively in visualization and analysis of medical data from modalities like CT and MR, usually after a 3D segmentation of the structures of interest have been performed. Implementations of MC on CPUs are slow, using several seconds (even minutes) […]

OpenCL

Nov, 11

A parallel method for tuning Fuzzy TSK Systems with CUDA

This paper studies an option for offloading some types of AI processing to the Graphics Processing Unit (GPU), by proposing the parallelization of the Batch Least Squares (BLS) method for tuning consequent parameters and the gradient method for tuning input fuzzy sets in a Takagi-Sugeno-Kang Fuzzy Inference System using the Compute Unified Device Architecture (CUDA). […]

CUDA

Nov, 11

Analysis of periodic anisotropic media by means of split-field FDTD method and GPU computing

The implementation of the Split-Field Finite Difference Time-Domain (SP-FDTD) method in Graphics Pro- cessing Units is described in this work. This formalism is applied to light wave propagation through periodic media with arbitrary anisotropy. The anisotropic media is modeled by means of a permittivity tensor with non-diagonal elements and absorbing boundary conditions are also considered. […]

CUDA

high performance computing on graphics processing units: hgpu.org

Posts

Load Balanced Parallel GPU Out-of-Core for Continuous LOD Model Visualization

G-SNPM – A GPU-based SNP mapping tool

Performance modeling of atomic additions on GPU scratchpad memory

A simple method to accelerate fringe analysis algorithms based on graphics processing unit and MATLAB

Correctly rounding elementary functions on GPU

Efficient similarity search on multimedia databases

An architecture for real time fluid simulation using multiple GPUs

Real-Time Scheduling Using GPUs – Advanced and More Accurate Proof of Feasibility

Kernel Weaver: Automatically Fusing Database Primitives for Efficient GPU Computation

Real-Time Surface Extraction and Visualization of Medical Images using OpenCL and GPUs

A parallel method for tuning Fuzzy TSK Systems with CUDA

Analysis of periodic anisotropic media by means of split-field FDTD method and GPU computing

Recent source codes

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)