high performance computing on graphics processing units: hgpu.org

Posts

Apr, 9

Data-parallel algorithms for large-scale real-time simulation of the cellular potts model on graphics processing units

In the following paper we present techniques for data-parallel execution of the cellular potts model (CPM) on graphics processing units (GPUs). We have developed data-structures and algorithms that are optimized to use available hardware resources on the GPU. To the best of our knowledge, this is the first attempt at using data-parallel techniques for simulating […]

CUDA

•

OpenGL

Apr, 9

Real-time parallel remote rendering for mobile devices using graphics processing units

Demand for 3D visualization is increasing in mobile devices as users have come to expect more realistic immersive experiences. However, limited networking and computing resources on mobile devices remain challenges. A solution is to have a proxy-based framework that offloads the burden of rendering computation from mobile devices to more powerful servers. We present the […]

Apr, 9

A Parallel Gibbs Sampling Algorithm for Motif Finding on GPU

Motif is overrepresented pattern in biological sequence and motif finding is an important problem in bioinformatics. Due to high computational complexity of motif finding, more and more computational capabilities are required as the rapid growth of available biological data, such as gene transcription data. Among many motif finding algorithms, Gibbs sampling is an effective method […]

CUDA

Apr, 9

The method of improving performace of the GPU-accelerated 2D FDTD simulator

In this paper, several methods of optimizing parallel implementation of 2D FDTD algorithm are presented. Some practical problems occurring in real simulations are taken into consideration. Moreover, the presented methods are supported with appropriate tests and practical examples.

Apr, 8

Parallel Dense Gauss-Seidel Algorithm on Many-Core Processors

The Gauss-Seidel method is very efficient for solving problems such as tightly-coupled constraints with possible redundancies. However, the underlying algorithm is inherently sequential. Previous works have exploited sparsity in the system matrix to extract parallelism. In this paper, we propose to study several parallelization schemes for fully-coupled systems, unable to be parallelized by existing methods, […]

CUDA

Apr, 8

Optimize or Wait? Using llc Fast-Prototyping Tool to Evaluate CUDA Optimizations

Over the last few years, we have witnessed the proliferation of GPU devices onHPC environments. Manufacturers produce new versions of their devices every few years, though, posing a new problem for scientists and engineers using their technology: is it worth the time and effort spent optimizing the codes for the current version? Or it is […]

CUDA

Apr, 8

Support Vector Machines on GPU with Sparse Matrix Format

Emerging general-purpose Graphics Processing Unit (GPU) provides a multi-core platform for wide applications, including machine learning algorithms. In this paper, we proposed several techniques to accelerate Support Vector Machines (SVM) on GPUs. Sparse matrix format is introduced into parallel SVM to achieve better performance. Experimental results show that the speedup of 55x-133.8x over LIBSVM can […]

Apr, 8

High-Speed Implementations of Block Cipher ARIA Using Graphics Processing Units

The power of graphics processing unit(GPU) has been increasing rapidly more than that of CPU. It is not surprising that many software libraries were developed which enable us to use the power of GPU for general computations especially in parallel data processing. In this paper, we propose implementations of the standard block cipher ARIA of […]

CUDA

•

OpenGL

Apr, 8

Record Setting Software Implementation of DES Using CUDA

The increase in computational power of off-the-shelf hardware offers more and more advantageous tradeoffs among efficiency, cost and availability, thus enhancing the feasibility of of cryptanalytic attacks aiming to lower the security of widely used cryptosystems. In this paper we illustrate an GPU-based software implementation of the most efficent variant of Data Encryption Standard (DES), […]

CUDA

Apr, 8

On accelerating iterative algorithms with CUDA: A case study on Conditional Random Fields training algorithm for biological sequence alignment

The accuracy of Conditional Random Fields (CRF) is achieved at the cost of huge amount of computation to train model. In this paper we designed the parallelized algorithm for the Gradient Ascent based CRF training methods for biological sequence alignment. Our contribution is mainly on two aspects: 1) We flexibly parallelized the different iterative computation […]

CUDA

Apr, 8

A Comparative Study on ASIC, FPGAs, GPUs and General Purpose Processors in the O(N^2) Gravitational N-body Simulation

In this paper, we describe the implementation of gravitational force calculation for N-body simulations in the context of astrophysics. It will describe high performance implementations on general purpose processors, GPUs, and FPGAs, and compare them using a number of criteria including speed performance, power efficiency and cost of development. These results show that, for gravitational […]

Apr, 8

Program Optimization of Array-Intensive SPEC2k Benchmarks on Multithreaded GPU Using CUDA and Brook+

Graphic Processing Unit (GPU), with many light-weight data-parallel cores, can provide substantial parallel computing power to accelerate several general purpose applications. Both the AMD and NVIDIA corps provide their specific high performance GPUs and software platforms. As the floating-point computing capacity increases continually, the problem of “memory-wall” becomes more serious, especially for array-intensive applications. In […]

CUDA

high performance computing on graphics processing units: hgpu.org

Posts

Data-parallel algorithms for large-scale real-time simulation of the cellular potts model on graphics processing units

Real-time parallel remote rendering for mobile devices using graphics processing units

A Parallel Gibbs Sampling Algorithm for Motif Finding on GPU

The method of improving performace of the GPU-accelerated 2D FDTD simulator

Parallel Dense Gauss-Seidel Algorithm on Many-Core Processors

Optimize or Wait? Using llc Fast-Prototyping Tool to Evaluate CUDA Optimizations

Support Vector Machines on GPU with Sparse Matrix Format

High-Speed Implementations of Block Cipher ARIA Using Graphics Processing Units

Record Setting Software Implementation of DES Using CUDA

On accelerating iterative algorithms with CUDA: A case study on Conditional Random Fields training algorithm for biological sequence alignment

A Comparative Study on ASIC, FPGAs, GPUs and General Purpose Processors in the O(N^2) Gravitational N-body Simulation

Program Optimization of Array-Intensive SPEC2k Benchmarks on Multithreaded GPU Using CUDA and Brook+

Recent source codes

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)