high performance computing on graphics processing units: hgpu.org

Posts

Nov, 4

OpenCUDA+MPI: A Framework for Heterogeneous GP-GPU Distributed Computing

The introduction and rise of General Purpose Graphics Computing has significantly impacted parallel and high-performance computing. It has introduced challenges when it comes to distributed computing with GPUs. Current solutions target specifics: specific hardware, specific network topology, a specific level of processing. Those restrictions on GPU computing limit scientists and researchers in various ways. The […]

CUDA

Nov, 3

PAKCK: Performance and Power Analysis of Key Computational Kernels on CPUs and GPUs

Recent projections suggest that applications and architectures will need to attain 75 GFLOPS/W in order to support future DoD missions. Meeting this goal requires deeper understanding of kernel and application performance as a function of power and architecture. As part of the PAKCK study, a set of DoD application areas, including signal and image processing […]

CUDA

Nov, 3

Parallel CPU and GPU computations to solve the job shop scheduling problem with blocking

In this paper, we studied the parallelization of an exact method to solve the job shop scheduling problem with blocking JSB. We used a modeling based on graph theory exploiting the alternative graphs. We have proposed an original parallelization technique for performing a parallel computation in the various branches of the search tree. This technique […]

CUDA

Nov, 3

A Fast and Secure Way to Prevent SQL Injection Attacks using Bitslice Technique and GPU Support

Most of the web applications are associated with database as back-end so there are possibilities of SQL injection attacks (SQLIA) on it. Even SQLIA is among top ten attacks according to Open Web Application Security Project (OWASP) but still approaches are not able to give proper solution to this problem. Numbers of measures are also […]

CUDA

Nov, 3

A Computational Comparison of Basis Updating Schemes for the Simplex Algorithm on a CPU-GPU System

The computation of the basis inverse is the most time-consuming step in simplex type algorithms. This inverse does not have to be computed from scratch at any iteration, but updating schemes can be applied to accelerate this calculation. In this paper, we perform a computational comparison in which the basis inverse is computed with five […]

CUDA

Nov, 3

Fast 3D Salient Region Detection in Medical Images using GPUs

Automated detection of visually salient regions is an active area of research in computer vision. Salient regions can serve as inputs for object detectors as well as inputs for region based registration algorithms. In this paper we consider the problem of speeding up computationally intensive bottom-up salient region detection in 3D medical volumes.The method uses […]

CUDA

Nov, 3

Towards GPU-Accelerated Large-Scale Graph Processing in the Cloud

Recently, we have witnessed that cloud providers start to offer heterogeneous computing environments. There have been wide interests in both cluster and cloud of adopting graphics processors (GPUs) as accelerators for various applications. On the other hand, large-scale processing is important for many data-intensive applications in the cloud. In this paper, we propose to leverage […]

CUDA

Nov, 3

Datalog for GPUs

Datalog is a language based on first order logic that was investigated as a data model for relational databases in the 1980s. It has recently been used in various new application areas, prompting proposals to run Datalog programs on new platforms such as Graphics Processing Units (GPUs) and MapReduce. Back then and nowadays, interest in […]

CUDA

Nov, 3

Accelerated rescaling of single Monte Carlo simulation runs with the Graphics Processing Unit (GPU)

To interpret fiber-based and camera-based measurements of remitted light from biological tissues, researchers typically use analytical models, such as the diffusion approximation to light transport theory, or stochastic models, such as Monte Carlo modeling. To achieve rapid (ideally real-time) measurement of tissue optical properties, especially in clinical situations, there is a critical need to accelerate […]

CUDA

Nov, 3

Accelerating Inclusion-based Pointer Analysis on Heterogeneous CPU-GPU Systems

This paper describes the first implementation of Andersen’s inclusion-based pointer analysis for C programs on a heterogeneous CPU-GPU system, where both its CPU and GPU cores are used. As an important graph algorithm, Andersen’s analysis is difficult to parallelise because it makes extensive modifications to the structure of the underlying graph, in a way that […]

CUDA

Nov, 3

N-Body Simulation Using GP-GPU: Evaluating Host/Device Memory Transference Overhead

N-Body simulation algorithms are amongst the most commonly used within the field of scientific computing. Especially in computational astrophysics, they are used to simulate gravitational scenarios for solar systems or galactic collisions. Parallel versions of such N-Body algorithms have been extensively designed and optimized for multicore and distributed computing schemes. However, N-Body algorithms are still […]

CUDA

Nov, 2

Computer Tomography and Ultrasonography Image Registration Based on the Cooperation of GPU and CPU

Image registration is wildly used in the biomedical image, but there are too many textures and noises in the biomedical image to get a precise image registration. In order to get the excellent registration performance, it needs more complex image processing, and it will spend expensive computation cost. For the real time issue, this paper […]

CUDA

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

chemtrain-deploy: A parallel and scalable framework for machine learning potentials in million-atom MD simulations

microSYCL: SYCL micro-benchmarks repository

Exploring SYCL as a Portability Layer for High-Performance Computing on CPUs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Posts

OpenCUDA+MPI: A Framework for Heterogeneous GP-GPU Distributed Computing

PAKCK: Performance and Power Analysis of Key Computational Kernels on CPUs and GPUs

Parallel CPU and GPU computations to solve the job shop scheduling problem with blocking

A Fast and Secure Way to Prevent SQL Injection Attacks using Bitslice Technique and GPU Support

A Computational Comparison of Basis Updating Schemes for the Simplex Algorithm on a CPU-GPU System

Fast 3D Salient Region Detection in Medical Images using GPUs

Towards GPU-Accelerated Large-Scale Graph Processing in the Cloud

Datalog for GPUs

Accelerated rescaling of single Monte Carlo simulation runs with the Graphics Processing Unit (GPU)

Accelerating Inclusion-based Pointer Analysis on Heterogeneous CPU-GPU Systems

N-Body Simulation Using GP-GPU: Evaluating Host/Device Memory Transference Overhead

Computer Tomography and Ultrasonography Image Registration Based on the Cooperation of GPU and CPU

Recent source codes

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

Most viewed papers (last 30 days)