high performance computing on graphics processing units: hgpu.org

Posts

Apr, 13

Parallel programming with CUDA

This report documents our master thesis project, which is about parallel programming with CUDA, the NVIDIA GPU architecture with support for general purpose computing. The purpose of the thesis is to uncover the qualities of CUDA as a parallel computing platform, determining the possibilities and limitations of its ability to handle different types of algorithms. […]

CUDA

Apr, 13

Design of high-performance parallelized gene predictors in MATLAB

BACKGROUND: This paper proposes a method of implementing parallel gene prediction algorithms in MATLAB. The proposed designs are based on either Goertzel’s algorithm or on FFTs and have been implemented using varying amounts of parallelism on a central processing unit (CPU) and on a graphics processing unit (GPU). FINDINGS: Results show that an implementation using […]

CUDA

Apr, 12

Spatial Indexing of Large-Scale Geo-Referenced Point Data on GPGPUs Using Parallel Primitives

Modern positioning and locating technologies, e.g., GPS, have generated huge amounts of geo-referenced point data that are crucial to understand environmental and social-economic phenomena. Unfortunately, traditional disk-resident databases are inefficient in handling large-scale point data. In this study, we propose to utilize the massive data parallel processing power of General Purpose computing on Graphics Processing […]

CUDA

Apr, 12

Verifying GPU Kernels by Test Amplification

We present a novel technique for verifying properties of data parallel GPU programs via test amplification. The key insight behind our work is that we can use the technique of static information flow to amplify the result of a single test execution over the set of all inputs and interleavings that affect the property being […]

CUDA

Apr, 12

Programming issues for video analysis on Graphics Processing Units

Video processing is a part of signal processing where input and/or output signals are video streams. It covers a wide variety of applications that are generally very compute-intensive due to the algorithmic complexity. Moreover, many of these applications demand real-time performance. Fulfilling these requirements makes necessary the use of hardware acceleration such as Graphics Processing […]

CUDA

Apr, 12

TDDFT in massively parallel computer architectures: the OCTOPUS project

OCTOPUS is a general-purpose density-functional theory (DFT) code, with a particular emphasis on the time-dependent version of DFT (TDDFT). In this article we present the ongoing efforts for the parallelisation of OCTOPUS. We focus on the real-time variant of TDDFT, where the time-dependent Kohn-Sham equations are directly propagated in time. This approach has a great […]

OpenCL

Apr, 12

Bonsai: A GPU Tree-Code

We present a gravitational hierarchical N-body code that is designed to run efficiently on Graphics Processing Units (GPUs). All parts of the algorithm are executed on the GPU which eliminates the need for data transfer between the Central Processing Unit (CPU) and the GPU. Our tests indicate that the gravitational tree-code outperforms tuned CPU code […]

CUDA

Apr, 11

Exposing Fine-Grained Parallelism in Algebraic Multigrid Methods

Algebraic multigrid methods for large, sparse linear systems are a necessity in many computational simulations, yet parallel algorithms for such solvers are generally decomposed into coarse-grained tasks suitable for distributed computers with traditional processing cores. However, accelerating multigrid on massively parallel throughput-oriented processors, such as the GPU, demands algorithms with abundant fine-grained parallelism. In this […]

CUDA

Apr, 11

RGEM: A Responsive GPGPU Execution Model for Runtime Engines

General-purpose computing on graphics processing units, also known as GPGPU, is a burgeoning technique to enhance the computation of parallel programs. Applying this technique to real-time applications, however, requires additional support for timeliness of execution. In particular, the non-preemptive nature of GPGPU, associated with copying data to/from the device memory and launching code onto the […]

CUDA

Apr, 11

Modular Arithmetic for Solving Linear Equations on the GPU

The linear algebraic equations solution is quite a frequent task within numerical mathematics. One might often find problems while solving problems of the ill-conditioned matrix. The solution stability cannot be ensured for large dense sets of linear equations. Rounding error during the numerical computation cannot be tolerated. There are methods developed that minimize the influence […]

OpenCL

Apr, 11

Real-time Visualization of Streaming Text with Force-Based Dynamic System

Streamit lets users explore visualizations of text streams without prior knowledge of the data. It incorporates incoming documents from a continuous source into an existing visualization context with automatic grouping and separation based on document similarities. A powerful user interface allows in-depth data analysis.

CUDA

Apr, 11

The 2012 International Conference on Network Computing and Information Security and the 2012 International Conference on Multimedia and Signal Processing, NCIS’12 – CMSP’12

The 2012 International Conference on Network Computing and Information Security (NCIS’12) and the 2012 International Conference on Multimedia and Signal Processing (CMSP’12) will be jointly held at Shanghai, China in December 7-9, 2012. NCIS’12- CMSP’12 aims to provide a high-level international forum for scientists and researchers to present the state of the art of Network […]

high performance computing on graphics processing units: hgpu.org

Posts

Parallel programming with CUDA

Design of high-performance parallelized gene predictors in MATLAB

Spatial Indexing of Large-Scale Geo-Referenced Point Data on GPGPUs Using Parallel Primitives

Verifying GPU Kernels by Test Amplification

Programming issues for video analysis on Graphics Processing Units

TDDFT in massively parallel computer architectures: the OCTOPUS project

Bonsai: A GPU Tree-Code

Exposing Fine-Grained Parallelism in Algebraic Multigrid Methods

RGEM: A Responsive GPGPU Execution Model for Runtime Engines

Modular Arithmetic for Solving Linear Equations on the GPU

Real-time Visualization of Streaming Text with Force-Based Dynamic System

The 2012 International Conference on Network Computing and Information Security and the 2012 International Conference on Multimedia and Signal Processing, NCIS’12 – CMSP’12

Recent source codes

OpScanner

Atlas CLI: Machine Learning (ML) Lifecycle & Transparency Manager

transformers_tvm: Implementation of Encoder Decoder transformer on TVM

INT v.s. FP: A framework to compare low-bit integer and float-point formats

AutoDock-GPU: AutoDock for GPUs and other accelerators

NCCLX: collective communication framework

Tutoring LLM into a Better CUDA Optimizer

Adaptivity in AdaptiveCpp: Optimizing Performance by Leveraging Runtime Information During JIT-Compilation

Kernel Library for LLM Serving

Neptune: Advanced ML Operator Fusion for Locality and Parallelism on GPUs

Most viewed papers (last 30 days)