high performance computing on graphics processing units: hgpu.org

Posts

Dec, 21

Generating SU(Nc) pure gauge lattice QCD configurations on GPUs with CUDA and OpenMP

The starting point of any lattice QCD computation is the generation of a Markov chain of gauge field configurations. Due to the large number of lattice links and due to the matrix multiplications, generating SU(Nc) lattice QCD configurations is a highly demanding computational task, requiring advanced computer parallel architectures such as clusters of several Central […]

CUDA

Dec, 21

Implementation of a Parallel Tree Method on a GPU

The kd-tree is a fundamental tool in computer science. Among other applications, the application of kd-tree search (by the tree method) to the fast evaluation of particle interactions and neighbor search is highly important, since the computational complexity of these problems is reduced from O(N^2) for a brute force method to O(N log N) for […]

OpenCL

Dec, 20

Performance and Quality of Random Number Generators

Random number generation continues to be a critical component in much of computational science and the tradeoff between quality and computational performance is a key issue for many numerical simulations. We review the performance and statistical quality of some well known algorithms for generating pseudo random numbers. Graphical Processing Units (GPUs) are a powerful platform […]

CUDA

Dec, 20

GPU Accelerated PK-means Algorithm for Gene Clustering

In this paper, a novel GPU accelerated scheme for the PK-means gene clustering algorithm is proposed. According to the native particle-pair structure of the PKmeans algorithm, a fragment shader program is tailor-made to process a pair of particles in one pass for the computationintensive portion. As the output channel of a fragment consisting of 4 […]

CUDA

Dec, 20

A Framework for Genetic Algorithms in Parallel Environments

In this research, we developed a framework to execute genetic algorithms (GA) in various parallel environments. GA researchers can prepare implementations of GA operators and fitness functions using this framework. We have prepared several types of communication library in various parallel environments. Combining GA implementations and our libraries, GA researchers can benefit from parallel processing […]

CUDA

Dec, 20

Parallel Contour-Buildup Algorithm for the Molecular Surface

Molecular Dynamics simulations are an essential tool for many applications. The simulation of large molecules – like proteins – over long trajectories is of high importance e. g. for pharmaceutical, biochemical and medical research. For analyzing these data sets interactive visualization plays a crucial role as details of the interactions of molecules are often affected […]

CUDA

Dec, 20

Analysis of GPGPU Platforms Efficiency in General-Purpose Computations

Nowadays a technique of using graphics processing units (GPUs) for general-purpose computing (or GPGPU) is becoming more and more widespread. The goal of this paper is to analyze efficiency of computing with use of the GPGPU technique, depending on several factors. In this paper, there are analyzed differences in performance and platform organization between widespread […]

OpenCL

Dec, 20

Can CUDA be exposed through web services?

Massively parallel programming is an increasingly growing field with the recent introduction of general purpose GPU computing. Modern graphics processors from NVidia and AMD have massively parallel architectures that can be used for 3D ren-dering, financial analysis, physics simulations, and biomedical analysis. These mas-sively parallel systems are exposed to programmers through interfaces such as NVidias […]

CUDA

•

OpenCL

Dec, 20

Parsing in Parallel on Multiple Cores and GPUs

This paper examines the ways in which parallelism can be used to speed the parsing of dense PCFGs. We focus on two kinds of parallelism here: Symmetric Multi-Processing (SMP) parallelism on shared-memory multicore CPUs, and Single-Instruction MultipleThread (SIMT) parallelism on GPUs. We describe how to achieve speed-ups over an already very efficient baseline parser using […]

CUDA

Dec, 20

The Future in Mobile Multicore Computing

Mobile computers are an essential part of consumer technology, and we are fast approaching a future where all mobile computers have general purpose GPUs (GPGPUs) and multicore CPUs in them. We describe this development as Mobile Multicore Computing (MMC). In this paper, we discuss the importance of MMC, as well as three critical issues associated […]

Dec, 20

Floating-point Mixed-radix FFT Core Generation for FPGA and Comparison with GPU and CPU

Over the past decades, we noticed huge advances in FPGA technologies. The topic of floating-point accelerator on FPGA has gained renewed interests due to the increased device size and the emergence of fast hardware floating-point library. The popularity of FFT makes it easier to justify spending lots of effort doing detailed optimization. However, the ever […]

CUDA

Dec, 19

Fast Random Graph Generation

Today, several database applications call for the generation of random graphs. A fundamental, versatile random graph model adopted for that purpose is the Erdos-Renyi Gamma_v,p model. This model can be used for directed, undirected, and multipartite graphs, with and without self-loops; it induces algorithms for both graph generation and sampling, hence is useful not only […]

CUDA

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Generating SU(Nc) pure gauge lattice QCD configurations on GPUs with CUDA and OpenMP

Implementation of a Parallel Tree Method on a GPU

Performance and Quality of Random Number Generators

GPU Accelerated PK-means Algorithm for Gene Clustering

A Framework for Genetic Algorithms in Parallel Environments

Parallel Contour-Buildup Algorithm for the Molecular Surface

Analysis of GPGPU Platforms Efficiency in General-Purpose Computations

Can CUDA be exposed through web services?

Parsing in Parallel on Multiple Cores and GPUs

The Future in Mobile Multicore Computing

Floating-point Mixed-radix FFT Core Generation for FPGA and Comparison with GPU and CPU

Fast Random Graph Generation

Recent source codes

Allo: Accelerator Design Language

Towards Robust Agentic CUDA Kernel Benchmarking, Verification, and Optimization

HPC Benchmark Survey

HDM: Home made Diffusion Models

General Matrix Multiplication (GEMM)

CrossTL: Universal Programming Language & Translator

TBD-GPU

DG-SWEM - The Discontinuous Galerkin Shallow Water Equation Model

torchPDLP: Primal-Dual Linear Programming in PyTorch. In collaboration with AMD and IPAM

Benchmarks for Dissecting CPU-GPU Unified Physical Memory on AMD MI300A APUs

Most viewed papers (last 30 days)