high performance computing on graphics processing units: hgpu.org

Posts

May, 4

Implementations of the FFT algorithm on GPU

The fast Fourier transform (FFT) plays an important role in digital signal processing (DSP) applications, and its implementation involves a large number of computations. Many DSP designers have been working on implementations of the FFT algorithms on different devices, such as central processing unit (CPU), Field programmable gate array (FPGA), and graphical processing unit (GPU), […]

CUDA

May, 4

High-Order Schemes for the Shallow Water Equations on GPUs

In this thesis, well-balanced, central-upwind high-resolution methods of high order are developed for the two-dimensional shallow water equations, on the graphics processing unit (GPU). The methods are based on a fifth-order Weighted Essentially Non-Oscillating (WENO) reconstruction technique and a fourth-order Gaussian quadrature for the one-sided interface fluxes. Two schemes are implemented, one with bilinear interpolation […]

CUDA

May, 3

Algorithms for representation of 3D regions in radiotherapy planning software

This thesis reviews the fast marching method as a technique for computing the distance transform on GPU in the context of a radiotherapy planning software. The method has some interesting characteristics that, given the right circumstances, allow the distance transform to be computed for fewer voxels than commonly used alternatives. This can result in beneficial […]

CUDA

May, 3

Optimizing Similarity Computations for Ontology Matching – Experiences from GOMMA

An efficient computation of ontology mappings requires optimized algorithms and significant computing resources especially for large life science ontologies. We describe how we optimized n-gram matching for computing the similarity of concept names and synonyms in our match system GOMMA. Furthermore, we outline how to enable a highly parallel string matching on Graphical Processing Units […]

CUDA

May, 3

Data-rich astronomy: mining synoptic sky surveys

In the last decade a new generation of telescopes and sensors has allowed the production of a very large amount of data and astronomy has become, a data-rich science; this transition is often labeled as: "data revolution" and "data tsunami". The first locution puts emphasis on the expectations of the astronomers while the second stresses, […]

CUDA

May, 3

Impact of Warp Formation on GPU Performance

As computing power of GPU increases dramatically, the GPU is widely used for general-purpose parallel applications as well as graphics applications. Especially, programmers using the GPU can easily create multiple threads with the help of APIs provided by GPU vendors. In GPU architecture, threads are grouped into a warp to run on the SIMD pipeline, […]

May, 3

Feasibility Analysis of Low Cost Graphical Processing Units for Electromagnetic Field Simulations by Finite Difference Time Domain Method

Among several techniques available for solving Computational Electromagnetics (CEM) problems, the Finite Difference Time Domain (FDTD) method is one of the best suited approaches when a parallelized hardware platform is used. In this paper we investigate the feasibility of implementing the FDTD method using the NVIDIA GT 520, a low cost Graphical Processing Unit (GPU), […]

CUDA

May, 3

AMD Developer Summit 2013, APU13

AMD is excited to bring together technology influencers from all over the world to share their vision and strategy for an open-standard heterogeneous computing ecosystem. Last year’s AMD Developer Summit saw the formation of the Heterogeneous System Architecture (HSA) Foundation. In 2011, Microsoft announced c++AMP, and AMD first revealed its Graphics Core Next Architecture. This […]

May, 2

Efficient implementation for QUAD stream cipher with GPUs

QUAD stream cipher uses multivariate polynomial systems. It has provable security based on the computational hardness assumption. More specifically, the security of QUAD depends on hardness of solving non-linear multivariate systems over a finite field, and it is known as an NP-complete problem. However, QUAD is slower than other stream ciphers, and an efficient implementation, […]

CUDA

May, 2

GPU accelerated Trotter-Suzuki solver for quantum spin dynamics

The resolution of dynamics in out of equilibrium quantum spin systems relies at the heart of fundamental questions among Quantum Information Processing, Statistical Mechanics and Nano-Technologies. Efficient computational simulations of interacting many-spin systems are extremely valuable tools for tackling such questions. Here, we use the Trotter-Suzuki (TS) algorithm, a well-known strategy that provides the evolution […]

CUDA

May, 2

A framework for data-access strategies in GPGPU programs

In recent years, graphics processing units (GPUs) became more and more popular as high performance processing units. Due to the availability of hundreds of cores, code fragments speed up significantly when they are transformed from CPU functions to GPU kernels. The transformation process is non-trivial and therefore error prone. Developing correct and efficient GPU accelerated […]

CUDA

May, 2

Adding GPU Computing to Computer Organization Courses

How can parallel computing topics be incorporated into core courses that are taken by the majority of undergraduate students? This paper reports our experiences adding GPU computing with CUDA into the core undergraduate computer organization course at two different colleges. We have found that even though programming in CUDA is not necessarily easy, programmer control […]

CUDA

high performance computing on graphics processing units: hgpu.org

Posts

Implementations of the FFT algorithm on GPU

High-Order Schemes for the Shallow Water Equations on GPUs

Algorithms for representation of 3D regions in radiotherapy planning software

Optimizing Similarity Computations for Ontology Matching – Experiences from GOMMA

Data-rich astronomy: mining synoptic sky surveys

Impact of Warp Formation on GPU Performance

Feasibility Analysis of Low Cost Graphical Processing Units for Electromagnetic Field Simulations by Finite Difference Time Domain Method

AMD Developer Summit 2013, APU13

Efficient implementation for QUAD stream cipher with GPUs

GPU accelerated Trotter-Suzuki solver for quantum spin dynamics

A framework for data-access strategies in GPGPU programs

Adding GPU Computing to Computer Organization Courses

Recent source codes

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)