high performance computing on graphics processing units: hgpu.org

Posts

Mar, 12

Performance Traps in OpenCL for CPUs

With its design concept of cross-platform portability, OpenCL can be used not only on GPUs (for which it is quite popular), but also on CPUs. Whether porting GPU programs to CPUs, or simply writing new code for CPUs, using OpenCL brings up the performance issue, usually raised in one of two forms: "OpenCL is not […]

OpenCL

Mar, 12

Automatic efficient data layout for multithreaded stencil codes on CPUs and GPUs

Stencil based computation on structured grids is a kernel at the heart of a large number of scientific applications. The variety of stencil kernels used in practice make this computation pattern difficult to assemble into a high performance computing library. With the multiplication of cores on a single chip, answering architectural alignment requirements became an […]

CUDA

Mar, 12

Morph Algorithms on GPUs

There is growing interest in using GPUs to accelerate graph algorithms such as breadth-first search, computing page-ranks, and finding shortest paths. However, these algorithms do not modify the graph structure, so their implementation is relatively easy compared to general graph algorithms like mesh generation and refinement, which morph the underlying graph in non-trivial ways by […]

CUDA

Mar, 9

Speeding Up Model Building for ECGA on CUDA Platform

Parallelization is a straightforward approach to enhance the efficiency for evolutionary computation due to its inherently parallel nature. Since NVIDIA released the compute unified device architecture (CUDA), graphic processing units have enabled lots of scalable parallel programs in a wide range of fields. However, parallelization of model building for EDAs is rarely studied. In this […]

CUDA

Mar, 9

Signal Processing and General Purpose Computing on GPU

Graphics processing units (GPUs) have been growing in popularity due to their impressive processing capabilities, and with general purpose programming languages such as NVIDIA’s CUDA interface, are becoming the platform of choice in the scientific computing community. Today the research community successfully uses GPU to solve a broad range of computationally demanding, complex problems. This […]

CUDA

Mar, 9

Detecting Computer Viruses using GPUs

Anti-virus software is the main defense mechanism against malware, which is becoming more common and advanced. A significant part of the virus scanning process is dedicated to scanning a given file against a set of virus signatures. As it is important that the overall scanning process be as fast as possible, efforts must be done […]

CUDA

Mar, 9

CU2rCU: A CUDA-to-rCUDA Converter

GPUs (Graphics Processor Units) are being increasingly embraced by the high performance computing and computational communities as an effective way of considerably reducing application execution time by accelerating significant parts of their codes. CUDA (Compute Unified Device Architecture) is a new technology developed by NVIDIA which leverages the parallel compute engine in GPUs. However, the […]

CUDA

Mar, 9

GPU accelerated maximum cardinality matching algorithms for bipartite graphs

We design, implement, and evaluate GPU-based algorithms for the maximum cardinality matching problem in bipartite graphs. Such algorithms have a variety of applications in computer science, scientific computing, bioinformatics, and other areas. To the best of our knowledge, ours is the first study which focuses on GPU implementation of the maximum cardinality matching algorithms. We […]

CUDA

Mar, 7

Solutions For Optimizing The Radix Sort Algorithmic Function Using The Compute Unified Device Architecture

In this paper, we have researched and developed solutions for optimizing the radix sort algorithmic function using the Compute Unified Device Architecture (CUDA). The radix sort is a common parallel primitive, an essential building block for many data processing algorithms, whose optimization improves the performance of a wide class of parallel algorithms useful in data […]

CUDA

Mar, 7

GPU based Eulerian Assembly of Genomes

Advances in sequencing technologies have revolutionized the field of genomics by providing cost effective and high throughput solutions. In this paper, we develop a parallel sequence assembler implemented on general purpose graphic processor units (GPUs). Our work was largely motivated by a growing need in the genomic community for sequence assemblers and increasing use of […]

CUDA

Mar, 7

GPU-Accelerated Standardand Multi-Population Cultural Algorithms

In this paper, we present three parallel cultural algorithms using CUDA-enabled GPUs. Firstly, we used the GPU to accelerate an expensive fitness function. Next, the parallel versions of both standard and multi-population CAs were presented. Experiments show that the standard CA with an expensive fitness function was made more than 600 times faster. On lightweight […]

CUDA

Mar, 7

Using Graphical Processing Units for Deterministic Single Machine Scheduling Problems

This paper gives an introduction to how graphical processing units can be used in non-graphical related problems or tasks. First a history of GPU is provided. The next part focuses on GPU programming. A brief description is given about the available hardware facilities and the available programming languages. As an initial result of the project […]

CUDA

high performance computing on graphics processing units: hgpu.org

Posts

Performance Traps in OpenCL for CPUs

Automatic efficient data layout for multithreaded stencil codes on CPUs and GPUs

Morph Algorithms on GPUs

Speeding Up Model Building for ECGA on CUDA Platform

Signal Processing and General Purpose Computing on GPU

Detecting Computer Viruses using GPUs

CU2rCU: A CUDA-to-rCUDA Converter

GPU accelerated maximum cardinality matching algorithms for bipartite graphs

Solutions For Optimizing The Radix Sort Algorithmic Function Using The Compute Unified Device Architecture

GPU based Eulerian Assembly of Genomes

GPU-Accelerated Standardand Multi-Population Cultural Algorithms

Using Graphical Processing Units for Deterministic Single Machine Scheduling Problems

Recent source codes

OpScanner

Atlas CLI: Machine Learning (ML) Lifecycle & Transparency Manager

transformers_tvm: Implementation of Encoder Decoder transformer on TVM

INT v.s. FP: A framework to compare low-bit integer and float-point formats

AutoDock-GPU: AutoDock for GPUs and other accelerators

NCCLX: collective communication framework

Tutoring LLM into a Better CUDA Optimizer

Adaptivity in AdaptiveCpp: Optimizing Performance by Leveraging Runtime Information During JIT-Compilation

Kernel Library for LLM Serving

Neptune: Advanced ML Operator Fusion for Locality and Parallelism on GPUs

Most viewed papers (last 30 days)