high performance computing on graphics processing units: hgpu.org

Posts

May, 12

Code Optimization and Performance Analysis of Oceanographic Software Package NEMO for GPGPU Systems

The paper presents our experience in code optimization and performance analysis of software package NEMO on hybrid parallel computer systems with accelerators. NEMO Ocean is a software package for oceanology, simulating ocean gyres and sea-ice models. Code optimization and performance analysis are performed for the case study of NEMO ORCA2_LIM configuration. All experiments are conducted […]

CUDA

May, 12

Accelerating the scoring module of mass spectrometry-based peptide identification using GPUs

BACKGROUND: Tandem mass spectrometry-based database searching is currently the main method for protein identification in shotgun proteomics. The explosive growth of protein and peptide databases, which is a result of genome translations, enzymatic digestions, and post-translational modifications (PTMs), is making computational efficiency in database searching a serious challenge. Profile analysis shows that most search engines […]

CUDA

May, 10

Dynamic Orchestration of Massively Data Parallel Execution

Graphics processing units (GPUs) are specialized hardware accelerators capable of rendering graphics much faster than conventional general-purpose processors. They are widely used in personal computers, tablets, mobile phones, and game consoles. Modern GPUs are not only efficient at manipulating computer graphics, but also are more effective than CPUs for algorithms where processing of large data […]

CUDA

•

OpenCL

May, 10

Fine-grain Task Aggregation and Coordination on GPUs

In general-purpose graphics processing unit (GPGPU) computing, data is processed by concurrent threads executing the same function. This model, dubbed single-instruction/multiple-thread (SIMT), requires programmers to coordinate the synchronous execution of similar operations across thousands of data elements. To alleviate this programmer burden, Gaster and Howes outlined the channel abstraction, which facilitates dynamically aggregating asynchronously produced […]

OpenCL

May, 10

A Study of the Parallelization of Hybrid SAT Solver using CUDA

SAT solver is an algorithm for finding the solution of a given problem by using CNF (Conjunctive Normal Form). Recently SAT solver studies have focused on the aspect of cryptography. The purpose of this paper is to construct the framework of a parallel SAT solver that can be applied to cryptanalysis. First, we transform an […]

CUDA

May, 10

GAMUT: GPU accelerated microRNA analysis to uncover target genes through CUDA-miRanda

BACKGROUND: Non-coding sequences such as microRNAs have important roles in disease processes. Computational microRNA target identification (CMTI) is becoming increasingly important since traditional experimental methods for target identification pose many difficulties. These methods are time-consuming, costly, and often need guidance from computational methods to narrow down candidate genes anyway. However, most CMTI methods are computationally […]

CUDA

May, 10

3D Objects Tracking by GPGPU-Enhanced Particle Filter Algorithms

Objects tracking methods have been wildly used in the field of video surveillance, motion monitoring, robotics and so on. Particle filter is one of the promising methods, but it is difficult to apply for real time objects tracking because of its high computation cost. In order to reduce the processing cost without sacrificing the tracking […]

CUDA

May, 9

Automatic Scheduling of Compute Kernels Across Heterogeneous Architectures

The world of high-performance computing has shifted from increasing single-core performance to extracting performance from heterogeneous multi- and many-core processors due to the power, memory and instruction-level parallelism walls. All trends point towards increased processor heterogeneity as a means for increasing application performance, from smartphones to servers. These various architectures are designed for different types […]

OpenCL

May, 9

Applying Source Level Auto-Vectorization to Aparapi Java

Ever since chip manufacturers hit the power wall preventing them from increasing processor clock speed, there has been an increased push towards parallelism for performance improvements. This parallelism comes in the form of both data parallel single instruction multiple data (SIMD) instructions, as well as parallel compute cores in both central processing units (CPUs) and […]

CUDA

•

OpenCL

May, 9

Acceleration of LSB Algorithm in GPU

This paper presents a method for acceleration of LSB (Least Significant Bit) Algorithm in GPU (Graphics Processing Unit) using a programming model called CUDA. CUDA is a state-of-the-art parallel computing architecture developed by nVIDIA. CUDA allows the programmers to access the GPU directly by invoking the Kernel. In Image Steganography, parallelization of computations to a […]

CUDA

May, 9

GPU Implementation of Parallel Support Vector Machine Algorithm with Applications to Detection Intruder

The network anomaly detection technology based on support vector machine (SVM) can efficiently detect unknown attacks or variants of known attacks, however, it cannot be used for detection of large-scale intrusion scenarios due to the demand of computational time. The graphics processing unit (GPU) has the characteristics of multi-threads and powerful parallel processing capability. Based […]

CUDA

May, 9

Understanding Protein Dynamics with L1-Regularized Reversible Hidden Markov Models

We present a machine learning framework for modeling protein dynamics. Our approach uses L1-regularized, reversible hidden Markov models to understand large protein datasets generated via molecular dynamics simulations. Our model is motivated by three design principles: (1) the requirement of massive scalability; (2) the need to adhere to relevant physical law; and (3) the necessity […]

CUDA

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

DeepCompile: A Compiler-Driven Approach to Optimizing Distributed Deep Learning Training

Large Language Model Powered C-to-CUDA Code Translation: A Novel Auto-Parallelization Framework

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Code Optimization and Performance Analysis of Oceanographic Software Package NEMO for GPGPU Systems

Accelerating the scoring module of mass spectrometry-based peptide identification using GPUs

Dynamic Orchestration of Massively Data Parallel Execution

Fine-grain Task Aggregation and Coordination on GPUs

A Study of the Parallelization of Hybrid SAT Solver using CUDA

GAMUT: GPU accelerated microRNA analysis to uncover target genes through CUDA-miRanda

3D Objects Tracking by GPGPU-Enhanced Particle Filter Algorithms

Automatic Scheduling of Compute Kernels Across Heterogeneous Architectures

Applying Source Level Auto-Vectorization to Aparapi Java

Acceleration of LSB Algorithm in GPU

GPU Implementation of Parallel Support Vector Machine Algorithm with Applications to Detection Intruder

Understanding Protein Dynamics with L1-Regularized Reversible Hidden Markov Models

Recent source codes

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

PELSI: Power-Efficient Layer-Switched Inference

Ouroboros: Virtualized Queues for dynamic memory management

MSCCL++: A GPU-driven communication stack for scalable AI applications

Benchmark compute shader of Unity against InteropUnityCUDA

Data-efficient LLM Fine-tuning for Code Generation

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Large Language Model Powered C-to-CUDA Code Translation: A Novel Auto-Parallelization Framework

Most viewed papers (last 30 days)