high performance computing on graphics processing units: hgpu.org

Posts

Mar, 1

Chestnut: A GPU Programming Language for Non-Experts

Graphics processing units (GPUs) are powerful devices capable of rapid parallel computation. GPU programming, however, can be quite difficult, limiting its use to experienced programmers and keeping it out of reach of a large number of potential users. We present Chestnut, a domain-specific GPU parallel programming language for parallel multi-dimensional grid applications. Chestnut is designed […]

CUDA

Mar, 1

Black-Box Side-Channel Attacks Highlight the Importance of Countermeasures: An Analysis of the Xilinx Virtex-4 and Virtex-5 Bitstream Encryption Mechanism

This paper presents a side-channel analysis of the bitstream encryption mechanism provided by Xilinx Virtex FPGAs. This work covers our results analyzing the Virtex-4 and Virtex-5 family showing that the encryption mechanism can be completely broken with moderate effort. The presented results provide an overview of a practical real-world analysis and should help practitioners to […]

CUDA

Mar, 1

Parallel Loopy Belief Propagation in Conditional Random Fields

Structured real world data can be represented with graphs whose structure encodes indepen dence assumptions within the data. Due to statistical advantages over generative graphical models, Conditional Random Fields (CRFs) are used in a wide range of classification tasks on structured data sets. CRFs can be learned from both, fully or partially supervised data, and […]

CUDA

Mar, 1

Benchmarking Next Generation Hardware Platforms: An Experimental Approach

Heterogeneous multi-cores-platforms comprised of both general purpose and accelerator cores-are becoming increasingly common. Further, with processor designs in which there are many cores on a chip, a recent trend is to include functional and performance asymmetries to balance their power usage vs. performance requirements. Coupled with this trend in CPUs is the development of high […]

CUDA

Mar, 1

GPU acceleration of the particle filter: the Metropolis resampler

We consider deployment of the particle filter on modern massively parallel hardware architectures, such as Graphics Processing Units (GPUs), with a focus on the resampling stage. While standard multinomial and stratified resamplers require a sum of importance weights computed collectively between threads, a Metropolis resampler favourably requires only pair-wise ratios between weights, computed independently by […]

CUDA

Feb, 29

A Fast and Efficient Simulation Framework for Modeling Heat Transport

Metropolitan centers can be affected by an urban heat island effect. Radiative heat build-up from pavement and buildings increases temperatures in the metropolitan area above the average temperatures normally found in the surrounding environment. One way to help reduce the heat island effect is to add parks, trees, or green roofs to these urban spaces. […]

CUDA

Feb, 29

A Restructuring Algorithm for CUDA

Graphic processing Units (GPUs) are gaining ground in high-performance computing. CUDA (an extension to C) is most widely used parallel programming framework for general purpose GPU computations. However, the task of writing optimized CUDA program is complex even for experts. We present a method for restructuring loops into an optimized CUDA kernels based on a […]

CUDA

Feb, 28

Parallel Computation for Discrete Orthogonal Moments of Images Using Graphic Processing Unit

A novel method is proposed for fast computing discrete orthogonal moments of large scale digital images using CUDA (Compute Unified Device Architecture) on GPU (Graphic Processing Unit). After original input image loading and mapping by partition model, parallelism was implemented by dividing onto GPU. Experimental results show that the proposed method outperforms the existing software […]

CUDA

Feb, 28

GPU Computing for Machine Learning Algorithms

Computing has rapidly established itself as essential and important to many branches of science, to the point where computational science is a commonly used term. Indeed, the application and importance of computing is set to grow dramatically across almost all the sciences. Computing has started to change how science is done, enabling new scientific advances […]

CUDA

Feb, 28

A GPU-based parallel algorithm for time series pattern mining

Mining of time series pattern is an important research area, of which getting LCSS(Longest Common Subsequence) between high-dimensional time series is one of the most important issues. Large scale data needs to be handled in practical applications, so the research of efficient retrieval method is becoming a realistic work. Based on the issues above, we […]

CUDA

Feb, 27

CUDA-enabled LBM Flow Simulation around Three Equilateral Cylinders using GPU Computing Processor

This study is concerned with the simulation of viscous flow past three equal diameter circular cylinders in equilateral-triangular arrangement. The hydrodynamic characteristics of cylinders are modelled by a 2Dlattice Boltzmann kernel which is constructed employing Compute Unified Device Architecture (CUDA) interface developed by nVIDIA. Computations using the developed kernel are performed for nine spacing ratios […]

CUDA

Feb, 27

An Energy Consumption Model for GPU Computing at Instruction Level

With the development of hardware and software, GPU has been used in General-Purpose computation field. The high density of computing resource on chip bring in high performance as well as high power consumption. So the power consumption of GPU has increasingly become one of the most important issue for the development of general computing with […]

high performance computing on graphics processing units: hgpu.org

Posts

Chestnut: A GPU Programming Language for Non-Experts

Black-Box Side-Channel Attacks Highlight the Importance of Countermeasures: An Analysis of the Xilinx Virtex-4 and Virtex-5 Bitstream Encryption Mechanism

Parallel Loopy Belief Propagation in Conditional Random Fields

Benchmarking Next Generation Hardware Platforms: An Experimental Approach

GPU acceleration of the particle filter: the Metropolis resampler

A Fast and Efficient Simulation Framework for Modeling Heat Transport

A Restructuring Algorithm for CUDA

Parallel Computation for Discrete Orthogonal Moments of Images Using Graphic Processing Unit

GPU Computing for Machine Learning Algorithms

A GPU-based parallel algorithm for time series pattern mining

CUDA-enabled LBM Flow Simulation around Three Equilateral Cylinders using GPU Computing Processor

An Energy Consumption Model for GPU Computing at Instruction Level

Recent source codes

OpScanner

Atlas CLI: Machine Learning (ML) Lifecycle & Transparency Manager

transformers_tvm: Implementation of Encoder Decoder transformer on TVM

INT v.s. FP: A framework to compare low-bit integer and float-point formats

AutoDock-GPU: AutoDock for GPUs and other accelerators

NCCLX: collective communication framework

Tutoring LLM into a Better CUDA Optimizer

Adaptivity in AdaptiveCpp: Optimizing Performance by Leveraging Runtime Information During JIT-Compilation

Kernel Library for LLM Serving

Neptune: Advanced ML Operator Fusion for Locality and Parallelism on GPUs

Most viewed papers (last 30 days)