high performance computing on graphics processing units: hgpu.org

Posts

Dec, 8

Graphics Processing Units for the Real-time Linear Elastostatic Simulation of Liver

Biomedical engineering solutions like surgical simulators need High Performance Computing (HPC) to achieve real-time performance. Graphics Processing Units (GPUs) offer HPC capabilities at low cost and low power consumption. In this work, it is demonstrated that a liver which is discretized by about 2500 finite element nodes, can be graphically simulated in realtime, by making […]

OpenGL

Dec, 8

Speeding up the evaluation phase of GP classification algorithms on GPUs

The efficiency of evolutionary algorithms has become a studied problem since it is one of the major weaknesses in these algorithms. Specifically, when these algorithms are employed for the classification task, the computational time required by them grows excessively as the problem complexity increases. This paper proposes an efficient scalable and massively parallel evaluation model […]

CUDA

Dec, 8

An Algorithm for Detecting Cycles in Undirected Graphs using CUDA Technology

Cycles count in a graph is an NP-complete problem. This work minimizes the execution time to solve the problem compared to the other traditional serial, CPU based one. It reduces the hardware resources needed to a single commodity GPU. We developed an algorithm to approximate counting the number of cycles in an undirected graph, by […]

CUDA

Dec, 8

Fast extraction of neuron morphologies from large-scale SBFSEM image stacks

Neuron morphology is frequently used to classify cell-types in the mammalian cortex. Apart from the shape of the soma and the axonal projections, morphological classification is largely defined by the dendrites of a neuron and their subcellular compartments, referred to as dendritic spines. The dimensions of a neuron’s dendritic compartment, including its spines, is also […]

CUDA

Dec, 8

Design and Optimization of Image Processing Algorithms on Mobile GPU

The advent of GPUs with programmable shaders on mobile phones has motivated developers to utilize GPU to offload computationally intensive tasks and relive the burden of embedded CPU. In this paper, we present a set of metrics to measure characteristics of a mobile phone GPU with the focus on image processing algorithms. These measures assist […]

Dec, 8

Research on CUDA-based Kriging Interpolation Algorithm

Three-dimensional geological model can describe the types of geological information efficiently, express a variety of topological relations among geological phenomena intuitively. Kriging interpolation algorithm is an important spatial interpolation method of three-dimensional geological modeling, but every grid point needs to compute augmented matrix and solve equations, so it costs too much time. With the modeling […]

CUDA

Dec, 7

Evolving Neural Networks on GPUs

Financial Time Series prediction attempts to model the behavior of financial markets using, among other things, tools like technical, intermarket, and fundamental indicators. Accurate prediction, however, is difficult for a number of reasons: financial markets are influenced, often in a non-linear, sometimes time-lagged fashion, by factors including interest and exchange rates, the rate of economic […]

CUDA

Dec, 7

GPU Implementation of the Keccak Hash Function Family

Hash functions are one of the most important cryptographic primitives. Some of the currently employed hash functions like SHA-1 or MD5 are considered broken today. Therefore, in 2007 the US National Institute of Standards and Technology announced a competition for a new family of hash functions. Keccak is one of the five final candidates to […]

CUDA

Dec, 7

Parallelizing AES on multicores and GPUs

The AES block cipher cryptographic algorithm is widely used and it is resource intensive. An existing sequential open source implementation of the algorithm was parallelized on multi-core microprocessors and GPUs. Performance results are presented.

CUDA

Dec, 7

An Efficient Parallel Motion Estimation Algorithm and X264 Parallelization in CUDA

H.264/AVC video encoders have been widely used for its high coding efficiency. Since the computational demand proportional to the frame resolution is constantly increasing, it has been of great interest to accelerate H.264/AVC by parallel processing. Recently, graphics processing units (GPUs) have emerged as a viable target for accelerating general purpose applications by exploiting fine-grain […]

CUDA

Dec, 7

Sparse-Matrix-CG-Solver in CUDA

This paper describes the implementation of a parallelized conjugate gradient solver for linear equation systems using CUDA-C. Given a real, symmetric and positive definite coefficient matrix and a right-hand side, the parallized cg-solver is able to find a solution for that system by exploiting the massive compute power of todays GPUs. Comparing sequential CPU implementations […]

CUDA

Dec, 7

Accelerating Braided B+ Tree Searches on a GPU with CUDA

Previous work has shown that using the GPU as a brute force method for SELECT statements on a SQLite database table yields significant speedups. However, this requires that the entire table be selected and transformed from the B-Tree to row-column format. This paper investigates possible speedups by traversing B+ Trees in parallel on the GPU, […]

CUDA

high performance computing on graphics processing units: hgpu.org

Posts

Graphics Processing Units for the Real-time Linear Elastostatic Simulation of Liver

Speeding up the evaluation phase of GP classification algorithms on GPUs

An Algorithm for Detecting Cycles in Undirected Graphs using CUDA Technology

Fast extraction of neuron morphologies from large-scale SBFSEM image stacks

Design and Optimization of Image Processing Algorithms on Mobile GPU

Research on CUDA-based Kriging Interpolation Algorithm

Evolving Neural Networks on GPUs

GPU Implementation of the Keccak Hash Function Family

Parallelizing AES on multicores and GPUs

An Efficient Parallel Motion Estimation Algorithm and X264 Parallelization in CUDA

Sparse-Matrix-CG-Solver in CUDA

Accelerating Braided B+ Tree Searches on a GPU with CUDA

Recent source codes

OpScanner

Atlas CLI: Machine Learning (ML) Lifecycle & Transparency Manager

transformers_tvm: Implementation of Encoder Decoder transformer on TVM

INT v.s. FP: A framework to compare low-bit integer and float-point formats

AutoDock-GPU: AutoDock for GPUs and other accelerators

NCCLX: collective communication framework

Tutoring LLM into a Better CUDA Optimizer

Adaptivity in AdaptiveCpp: Optimizing Performance by Leveraging Runtime Information During JIT-Compilation

Kernel Library for LLM Serving

Neptune: Advanced ML Operator Fusion for Locality and Parallelism on GPUs

Most viewed papers (last 30 days)