high performance computing on graphics processing units: hgpu.org

Posts

Jun, 10

Accelerating Multi-layer Perceptron based short term demand forecasting using Graphics Processing Units

Load forecasting plays a vitally important role in the operation and planning of the power system in a deregulated electricity market. A large variety of methods have been proposed for load forecasting. In this paper, we introduce the Graphics Processing Units (GPU) based computing to accelerate the short term load forecasting with multi-layer perceptron (MLP). […]

Jun, 10

The scoring sequences on profile Hidden Markov Models with delete states elimination by GPUs

A profile Hidden Markov Model (HMM) is well suited for representing profiles of multiple sequences alignments, and it has been becoming the main method of multiple sequences alignments in bioinformatics. The scoring of sequences on profile HMMs is compute-intensive, especially when there are many Markov models and many states in each model. A parallel algorithm […]

Jun, 10

Real-time rain simulation in cartoon style

An efficient method for simulating cartoon style rain in 3D environment is proposed here. By taking advantage of the parallelism and programmability of GPUs (graphic processing units), real-time interaction can be achieved. Splashing of raindrop is simulated using collision detection, series of stylized textures and rotations of point sprites. To simulate wind-driven raining effect, the […]

Jun, 10

Real-time rendering of large-scale tree scene

High-quality, realistic visualization of vegetation and tree model is always a long-standing goal of complex virtual natural scene. Rendering a photo-realistic forest scene in real time has an important significance in simulating the growing tree. In this paper, we present a method of 3D tree modeling and a hybrid rendering algorithm of large-scale forest scene […]

Jun, 10

Handwritten Digit Recognition with a Committee of Deep Neural Nets on GPUs

The competitive MNIST handwritten digit recognition benchmark has a long history of broken records since 1998. The most recent substantial improvement by others dates back 7 years (error rate 0.4%) . Recently we were able to significantly improve this result, using graphics cards to greatly speed up training of simple but deep MLPs, which achieved […]

Jun, 9

Single molecule detection of tuberculosis nucleic acid using dark field Tethered Particle Motion

Current methods for tuberculosis nucleic acid detection require amplification and labeling before detection is possible. We propose here a method for direct detection using Tethered Particle Motion: gold nanoparticles are tethered to a glass substrate by single-stranded DNA molecules consisting of the complementary sequence to the target. Detection takes place by observing a change in […]

Jun, 9

cuGWAM: Genome-wide association multifactor dimensionality reduction using CUDA-enabled high-performance graphics processing unit

Multifactor dimensionality reduction (MDR) method has been widely applied to detect gene-gene interactions that are well recognized as playing an important role in understanding complex traits, such as disease susceptibility. However, because of an exhaustive analysis of MDR, the current MDR software has some limitations to be extended to the genome-wide association studies (GWAS) with […]

CUDA

Jun, 9

Low-Frequency MLFMA on Graphics Processors

A parallelization of the low-frequency multilevel fast multipole algorithm (MLFMA) for graphics processing units (GPUs) is presented. The implementation exhibits speedups between 10 and 30 compared to a serial CPU implementation of the algorithm. The error of the MLFMA on the GPU is controllable down to machine precision. Under the typical method-of-moments (MoM) error requirement […]

CUDA

Jun, 9

Adaptive Optimization for Petascale Heterogeneous CPU/GPU Computing

In this paper, we describe our experiment developing an implementation of the Linpack benchmark for TianHe-1, a petascale CPU/GPU supercomputer system, the largest GPU-accelerated system ever attempted before. An adaptive optimization framework is presented to balance the workload distribution across the GPUs and CPUs with the negligible runtime overhead, resulting in the better performance than […]

Jun, 9

SAR focusing of P-band ice sounding data using back-projection

SAR processing can be applied to ice sounder data to improve along-track resolution and clutter suppression. This paper presents a time-domain back-projection technique for SAR focusing of ice sounder data. With this technique, variations in flight track and ice surface slope can be accurately accommodated at the expense of computation time. The back-projection algorithm can […]

CUDA

Jun, 9

The Graphics Processor as a Mathematical Coprocessor in MATLAB

We present an interface to the graphics processing unit (GPU) from MATLAB, and four algorithms from numerical linear algebra available through this interface; matrix-matrix multiplication, Gauss-Jordan elimination, PLU factorization, and tridiagonal Gaussian elimination. In addition to being a high level abstraction to the GPU, the interface offers background processing, enabling computations to be executed on […]

Jun, 9

Shader-based visual simulation of ocean wave

The shader in the GPU increases flexibility and enables customizations of vertex and fragment processing, and it also provides the programmer with various special effects essential in development of realistic 3D virtual scene. Compared to a CPU based simulation of the ocean water, the shader-based simulation in this paper reduces the complexity of the model […]

high performance computing on graphics processing units: hgpu.org

Posts

Accelerating Multi-layer Perceptron based short term demand forecasting using Graphics Processing Units

The scoring sequences on profile Hidden Markov Models with delete states elimination by GPUs

Real-time rain simulation in cartoon style

Real-time rendering of large-scale tree scene

Handwritten Digit Recognition with a Committee of Deep Neural Nets on GPUs

Single molecule detection of tuberculosis nucleic acid using dark field Tethered Particle Motion

cuGWAM: Genome-wide association multifactor dimensionality reduction using CUDA-enabled high-performance graphics processing unit

Low-Frequency MLFMA on Graphics Processors

Adaptive Optimization for Petascale Heterogeneous CPU/GPU Computing

SAR focusing of P-band ice sounding data using back-projection

The Graphics Processor as a Mathematical Coprocessor in MATLAB

Shader-based visual simulation of ocean wave

Recent source codes

CudaForge: An Agent Framework with Hardware Feedback for CUDA Kernel Optimization

LC Framework

pplx-garden: Perplexity open source garden for inference technology

Atlas CLI: Machine Learning (ML) Lifecycle & Transparency Manager

transformers_tvm: Implementation of Encoder Decoder transformer on TVM

OpScanner

INT v.s. FP: A framework to compare low-bit integer and float-point formats

AutoDock-GPU: AutoDock for GPUs and other accelerators

NCCLX: collective communication framework

Tutoring LLM into a Better CUDA Optimizer

Most viewed papers (last 30 days)