high performance computing on graphics processing units: hgpu.org

Posts

Oct, 25

Design and Implementation of GPU-Based Prim’s Algorithm

Minimum spanning tree is a classical problem in graph theory that plays a key role in a broad domain of applications. This paper proposes a minimum spanning tree algorithm using Prim’s approach on Nvidia GPU under CUDA architecture. By using new developed GPU-based Min-Reduction data parallel primitive in the key step of the algorithm, higher […]

CUDA

Oct, 25

Parallel Execution of AES-CTR Algorithm Using Extended Block Size

Data encryption and decryption are common operations in a network based application programs with security. In order to keep pace with the input data rate in such applications, real-time processing of data encryption/decryption is essential. For example, in an environment where a multimedia data is streamed, high speed data encryption/decryption is crucial. In this paper, […]

CUDA

Oct, 25

The Model of Computation of CUDA and its Formal Semantics

We formalize the model of computation of modern graphics cards based on the specification of Nvidia’s Compute Unified Device Architecture (CUDA). CUDA programs are executed by thousands of threads concurrently and have access to several different types of memory with unique access patterns and latencies. The underlying hardware uses a single instruction, multiple threads execution […]

CUDA

Oct, 25

BEAGLE: an Application Programming Interface and High-Performance Computing Library for Statistical Phylogenetics

Phylogenetic inference is fundamental to our understanding of most aspects of the origin and evolution of life, and in recent years there has been a concentration of interest in statistical approaches such as Bayesian inference and maximum likelihood estimation. Yet, for large datasets and realistic or interesting models of evolution, these approaches remain computationally demanding. […]

CUDA

Oct, 25

An Evolutionary Optimization Strategy Using Graphics Processing Units to Efficiently Investigate Gene-Gene Interactions in Genetic Association Studies

The analysis of gene-gene interactions related to common complex human diseases is complicated by the increasing scale of genetic association analysis. Concurrent with the advances in genetic technology that led to these large data sets, improvements have been made in parallel computing with graphics processing units (GPUs). The dataintensive nature of genetic association analysis makes […]

CUDA

Oct, 25

Distributed GPU Password Cracking Research Project

This research project explores the possiblities of intergrating GPU processing power with a network cluster in order to achieve better performance with respect to password cracking. First a literature study is performed on the field of passwords in general, GPGPU computing and distributed computing through means of middleware. With these building blocks, combined with current […]

OpenCL

Oct, 25

Extending Scala with General Purpose GPU Programming

In this report we document an attempt to make it easier to use powerful GPUs by extending the Scala compiler to automatically offload work to the GPU. We benchmark similar code and find that it provides between 2-3 speedup compared to the CPU alone. Finally we discuss ways to improve the extention to offload more […]

OpenCL

Oct, 25

Development of a Flow Solver with Complex Kinetics on the Graphic Processing Units

The current paper reports on the implementation of a numerical solver on the Graphic Processing Units (GPU) to model reactive gas mixture with detailed chemical kinetics. The solver incorporates high-order finite volume methods for solving the fluid dynamical equations coupled with stiff source terms. The chemical kinetics are solved implicitly via an operator-splitting method. We […]

CUDA

Oct, 24

Parallel Implementation of Devanagari Text Line and Word Segmentation Approach on GPU

Fast and accurate algorithms are necessary for Optical Character Recognition (OCR) systems to perform operations on document images such as pre-processing, segmentation, feature extraction, training and testing of classifiers and post processing. Text line and word segmentation are two important steps in any OCR system. Wrong segmentation may affect the accuracy rate of OCR systems. […]

CUDA

Oct, 24

Design and implementation of a high-performance stream-based computing platform on multigenerational GPUs

During this decade, high performance computation demand has been increasing more and more, for example in the field of humanities [1]. Scientists and investigators are in need of high speed and performance environments for their research, which need to perform millions of floating points operations per second [2]. One way to achieve this goal is […]

OpenCL

Oct, 24

Breaking DVB-CSA

CSA (Common Scrambling Algorithm) is used to encrypt digital audio and video streams in DVB (Digital Video Broadcasting). This is commonly used to limit access to this content to paying customers. In this paper, we present a practical attack (time memory trade-off) against CSA, that can be used to recover the ciphers key and decrypt […]

CUDA

•

OpenCL

Oct, 24

ASAMgpu V1.0-a moist fully compressible atmospheric model using graphics processing units (GPUs)

In this work the three dimensional compressible moist atmospheric model ASAMgpu is presented. The calculations are done using graphics processing units (GPUs). To ensure platform independence OpenGL and GLSL is used, with that the model runs on any hardware supporting fragment shaders. The MPICH2 library enables interprocess communication allowing the usage of more than one […]

OpenGL

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

chemtrain-deploy: A parallel and scalable framework for machine learning potentials in million-atom MD simulations

microSYCL: SYCL micro-benchmarks repository

Exploring SYCL as a Portability Layer for High-Performance Computing on CPUs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Design and Implementation of GPU-Based Prim’s Algorithm

Parallel Execution of AES-CTR Algorithm Using Extended Block Size

The Model of Computation of CUDA and its Formal Semantics

BEAGLE: an Application Programming Interface and High-Performance Computing Library for Statistical Phylogenetics

An Evolutionary Optimization Strategy Using Graphics Processing Units to Efficiently Investigate Gene-Gene Interactions in Genetic Association Studies

Distributed GPU Password Cracking Research Project

Extending Scala with General Purpose GPU Programming

Development of a Flow Solver with Complex Kinetics on the Graphic Processing Units

Parallel Implementation of Devanagari Text Line and Word Segmentation Approach on GPU

Design and implementation of a high-performance stream-based computing platform on multigenerational GPUs

Breaking DVB-CSA

ASAMgpu V1.0-a moist fully compressible atmospheric model using graphics processing units (GPUs)

Recent source codes

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

Most viewed papers (last 30 days)