high performance computing on graphics processing units: hgpu.org

Posts

May, 13

Programming for scientific computing on peta-scale heterogeneous parallel systems

Peta-scale high-performance computing systems are increasingly built with heterogeneous CPU and GPU nodes to achieve higher power efficiency and computation throughput. While providing unprecedented capabilities to conduct computational experiments of historic significance, these systems are presently difficult to program. The users, who are domain experts rather than computer experts, prefer to use programming models closer […]

CUDA

May, 11

A Distributed CPU-GPU Framework for Pairwise Alignments on Large-Scale Sequence Datasets

Several problems in computational biology require the all-against-all pairwise comparisons of tens of thousands of individual biological sequences. Each such comparison can be performed with the well-known Needleman-Wunsch alignment algorithm. However, with the rapid growth of biological databases, performing all possible comparisons with this algorithm in serial becomes extremely time-consuming. The massive computational power of […]

CUDA

May, 11

Exploring Computer Vision and Image Processing Algorithms in Teaching Parallel Programming

Computer Vision (CV) is a rapidly growing field, intent on enabling computers to process, analyze, and understand the information of images to produce structured information and/or make decisions. In recent years, interest in computer vision has grown in part as a result of both cheaper and more capable cameras, but also largely because of affordable […]

CUDA

•

OpenCL

May, 11

Parallel implementation of the wideband DOA algorithm on single core, multicore, GPU and IBM cell BE processor

The Multiple Signal Classification (MUSIC) algorithm is a powerful technique for determining the Direction of Arrival (DOA) of signals impinging on an antenna array.The algorithm is serial based, mathematically intensive, and requires substantial computing power to realize in real-time.Recently, multi-core processors are becoming more prevalent and affordable.The challenge of adapting existing serial based algorithms to […]

CUDA

May, 11

Blum Blum Shub on the GPU

CONTEXT. The cryptographically secure pseudo-random number generator Blum Blum Shub (BBS) is a simple algorithm with a strong security proof, however it requires very large numbers to be secure, which makes it computationally heavy. The Graphics Processing Unit (GPU) is a common vector processor originally dedicated to computer-game graphics, but has since been adapted to […]

OpenCL

May, 11

The GPU-based High-performance Pattern-matching Algorithm for Intrusion Detection

Graphics Processing Unit (GPU) has been converted to general purpose parallel processor devices from a single rendering. It performed far better than the CPU in many fields of science. String matching is widely used, especially in information retrieval, intrusion detection, Computational Biology etc. In this paper, we designed and implemented a GPU-based multi-string matching algorithm […]

CUDA

May, 11

A portable and high-performance matrix operations library for CPUs, GPUs and beyond

High-performance computing systems today include a variety of compute devices such as multi-core CPUs, GPUs and many-core accelerators. OpenCL allows programming different types of compute devices using a single API and kernel language. However, there is no standard matrix operations library in OpenCL for operations such as matrix multiplication that works well on a variety […]

OpenCL

May, 11

Real-Time Object Tracking by CUDA-accelerated Neural Network

An algorithm is proposed for tracking objects in real time. The algorithm is based on neural network implemented on GPU. Investigation and parameter optimization of the algorithm are realized. Tracking process has accelerated by 10 times and the training process has accelerated by 2 times versus to the sequential algorithm version. The maximum resolution of […]

CUDA

May, 11

A GPU-based Parallel Fireworks Algorithm for Optimization

Swarm intelligence algorithms have been widely used to solve difficult real world problems in both academic and engineering domains. Thanks to the inherent parallelism, various parallelized swarm intelligence algorithms have been proposed to speed up the optimization process, especially on the massively parallel processing architecture GPUs. However, conventional swarm intelligence algorithms are usually not designed […]

CUDA

May, 11

An Implementation of the Discontinuous Galerkin Method on Graphics Processing Units

Computing highly-accurate approximate solutions to partial differential equations (PDEs) requires both a robust numerical method and a powerful machine. We present a parallel implementation of the discontinuous Galerkin (DG) method on graphics processing units (GPUs). In addition to being flexible and highly accurate, DG methods accommodate parallel architectures well, as their discontinuous nature produces entirely […]

CUDA

May, 11

Auto-tuning a LOFAR radio astronomy pipeline in JavaCL

Modern radio telescopes, such as the Low Frequency Array (LOFAR) in the north of the Netherlands, process the signal from the sky in software rather than expensive special purpose hardware, This gives the astronomers an unprecedented flexibility to perform a vast amount of various scientific experiments. However, designing the actual software that would give optimal […]

OpenCL

May, 9

Three-dimensional LBM simulations of buoyancy-driven flow using Graphics processing units

Three-dimensional simulations of buoyancy-driven flow of two immiscible liquids are performed using lattice Boltzmann method (LBM) implemented on a graphics processing unit (GPU). Graphics processing unit is a new paradigm for computing fluid flows and has become more popular in the recent years. It is a powerful and convenient to use. LBM, which is an […]

CUDA

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Programming for scientific computing on peta-scale heterogeneous parallel systems

A Distributed CPU-GPU Framework for Pairwise Alignments on Large-Scale Sequence Datasets

Exploring Computer Vision and Image Processing Algorithms in Teaching Parallel Programming

Parallel implementation of the wideband DOA algorithm on single core, multicore, GPU and IBM cell BE processor

Blum Blum Shub on the GPU

The GPU-based High-performance Pattern-matching Algorithm for Intrusion Detection

A portable and high-performance matrix operations library for CPUs, GPUs and beyond

Real-Time Object Tracking by CUDA-accelerated Neural Network

A GPU-based Parallel Fireworks Algorithm for Optimization

An Implementation of the Discontinuous Galerkin Method on Graphics Processing Units

Auto-tuning a LOFAR radio astronomy pipeline in JavaCL

Three-dimensional LBM simulations of buoyancy-driven flow using Graphics processing units

Recent source codes

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Most viewed papers (last 30 days)