high performance computing on graphics processing units: hgpu.org

Posts

Jun, 16

E-MOGA: A General Purpose Platform for Multi Objective Genetic Algorithm running on CUDA

This paper introduces an Enhanced Multi Objective Genetic Algorithm (E-MOGA) running on Compute Unified Device Architecture (CUDA) hardware, as a general purpose tool that can solve conflict optimization problems. The tool demonstrates significant speed gains using affordable, scalable and commercially available hardware. The objectives of this research are: to enhance the general purpose Multi Objective […]

CUDA

Jun, 16

Accelerating Lambert’s Problem on the GPU in MATLAB

The challenges and benefits of using the GPU to compute solutions to Lambert’s Problem are discussed. Three algorithms (Universal Variables, Gooding’s algorithm, and Izzo’s algorithm) were adapted for GPU computation directly within MATLAB. The robustness of each algorithm was considered, along with the speed at which it could be computed on each of three computers. […]

CUDA

Jun, 16

Parallel Primitives based Spatial Join of Geospatial Data on GPGPUs

Modern GPU architectures closely resemble supercomputers. Commodity GPUs that have already been equipped with personal and cluster computers can be used to boost the performance of spatial databases and GIS. In this study, we report our preliminary work on designing and implementing a spatial join algorithm on GPUs by using generic parallel primitives that have […]

CUDA

Jun, 16

GiST Scan Acceleration using Coprocessors

Efficient lookups in huge, possibly multi-dimensional datasets are crucial for the performance of numerous use cases that generate multiple search operations at the same time, like point queries in ray tracing or spatial joins in collision detection of interactive 3D applications. These applications greatly benefit from index structures that quickly filter relevant candidates for further […]

CUDA

Jun, 15

Energy Efficiency Analysis of GPUs

In the last few years, Graphics Processing Units (GPUs) have become a great tool for massively parallel computing. GPUs are specifically designed for throughput and face several design challenges, specially what is known as the Power and Memory Walls. In these devices, available resources should be used to enhance performance and throughput, as the performance […]

CUDA

Jun, 14

SAGA: SystemC Acceleration on GPU Architectures

SystemC is a widespread language for HW/SW system simulation and design exploration, and thus a key development platform in embedded system design. However, the growing complexity of SoC designs is having an impact on simulation performance, leading to limited SoC exploration potential, which in turns affects development and verification schedules and time-to-market for new designs. […]

CUDA

Jun, 14

Performance Gains in Conjugate Gradient Computation with Linearly Connected GPU Multiprocessors

Conjugate gradient is an important iterative method used for solving least squares problems. It is compute-bound and generally involves only simple matrix computations. One would expect that we could fully parallelize such computation on the GPU architecture with multiple Stream Multiprocessors (SMs), each consisting of many SIMD processing units. While implementing a conjugate gradient method […]

CUDA

Jun, 14

Exploiting Unexploited Computing Resources for Computational Logics

We present an investigation of the use of GPGPU techniques to parallelize the execution of a satisfiability solver, based on the traditional DPLL procedure – which, in spite of its simplicity, still represents the core of the most competitive solvers. The investigation tackles some interesting problems, including the use of a predominantly data-parallel architecture, like […]

CUDA

Jun, 14

Parakeet: A Just-In-Time Parallel Accelerator for Python

High level productivity languages such as Python or Matlab enable the use of computational resources by nonexpert programmers. However, these languages often sacrifice program speed for ease of use. This paper proposes Parakeet, a library which provides a just-in-time (JIT) parallel accelerator for Python. Parakeet bridges the gap between the usability of Python and the […]

CUDA

Jun, 13

Using Fermi architecture knowledge to speed up CUDA and OpenCL programs

The NVIDIA graphics processing units (GPUs) are playing an important role as general purpose programming devices. The implementation of parallel codes to exploit the GPU hardware architecture is a task for experienced programmers. The threadblock size and shape choice is one of the most important user decisions when a parallel problem is coded. The threadblock […]

CUDA

•

OpenCL

Jun, 13

A Consumer Application for GPGPUs: Desktop Search

To date, the GPGPU approach has been mainly utilized for academic and scientific computing, for example, for genetic algorithms, image analysis, cryptography, or password cracking. Though video cards supporting GPGPU have become pervasive, there do not appear to be any applications utilizing GPGPU for a household user. In this paper, one consumer application for GPGPU […]

OpenCL

Jun, 13

Experiences with High-Level Programming Directives for Porting Applications to GPUs

HPC systems now exploit GPUs within their compute nodes to accelerate program performance. As a result, high-end application development has become extremely complex at the node level. In addition to restructuring the node code to exploit the cores and specialized devices, the programmer may need to choose a programming model such as OpenMP or CPU […]

CUDA

•

OpenCL

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

* * *

high performance computing on graphics processing units: hgpu.org

Posts

E-MOGA: A General Purpose Platform for Multi Objective Genetic Algorithm running on CUDA

Accelerating Lambert’s Problem on the GPU in MATLAB

Parallel Primitives based Spatial Join of Geospatial Data on GPGPUs

GiST Scan Acceleration using Coprocessors

Energy Efficiency Analysis of GPUs

SAGA: SystemC Acceleration on GPU Architectures

Performance Gains in Conjugate Gradient Computation with Linearly Connected GPU Multiprocessors

Exploiting Unexploited Computing Resources for Computational Logics

Parakeet: A Just-In-Time Parallel Accelerator for Python

Using Fermi architecture knowledge to speed up CUDA and OpenCL programs

A Consumer Application for GPGPUs: Desktop Search

Experiences with High-Level Programming Directives for Porting Applications to GPUs

Recent source codes

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Most viewed papers (last 30 days)