high performance computing on graphics processing units: hgpu.org

Posts

May, 3

Algorithms for representation of 3D regions in radiotherapy planning software

This thesis reviews the fast marching method as a technique for computing the distance transform on GPU in the context of a radiotherapy planning software. The method has some interesting characteristics that, given the right circumstances, allow the distance transform to be computed for fewer voxels than commonly used alternatives. This can result in beneficial […]

CUDA

May, 3

Optimizing Similarity Computations for Ontology Matching – Experiences from GOMMA

An efficient computation of ontology mappings requires optimized algorithms and significant computing resources especially for large life science ontologies. We describe how we optimized n-gram matching for computing the similarity of concept names and synonyms in our match system GOMMA. Furthermore, we outline how to enable a highly parallel string matching on Graphical Processing Units […]

CUDA

May, 3

Data-rich astronomy: mining synoptic sky surveys

In the last decade a new generation of telescopes and sensors has allowed the production of a very large amount of data and astronomy has become, a data-rich science; this transition is often labeled as: "data revolution" and "data tsunami". The first locution puts emphasis on the expectations of the astronomers while the second stresses, […]

CUDA

May, 3

Impact of Warp Formation on GPU Performance

As computing power of GPU increases dramatically, the GPU is widely used for general-purpose parallel applications as well as graphics applications. Especially, programmers using the GPU can easily create multiple threads with the help of APIs provided by GPU vendors. In GPU architecture, threads are grouped into a warp to run on the SIMD pipeline, […]

May, 3

Feasibility Analysis of Low Cost Graphical Processing Units for Electromagnetic Field Simulations by Finite Difference Time Domain Method

Among several techniques available for solving Computational Electromagnetics (CEM) problems, the Finite Difference Time Domain (FDTD) method is one of the best suited approaches when a parallelized hardware platform is used. In this paper we investigate the feasibility of implementing the FDTD method using the NVIDIA GT 520, a low cost Graphical Processing Unit (GPU), […]

CUDA

May, 3

AMD Developer Summit 2013, APU13

AMD is excited to bring together technology influencers from all over the world to share their vision and strategy for an open-standard heterogeneous computing ecosystem. Last year’s AMD Developer Summit saw the formation of the Heterogeneous System Architecture (HSA) Foundation. In 2011, Microsoft announced c++AMP, and AMD first revealed its Graphics Core Next Architecture. This […]

May, 2

Efficient implementation for QUAD stream cipher with GPUs

QUAD stream cipher uses multivariate polynomial systems. It has provable security based on the computational hardness assumption. More specifically, the security of QUAD depends on hardness of solving non-linear multivariate systems over a finite field, and it is known as an NP-complete problem. However, QUAD is slower than other stream ciphers, and an efficient implementation, […]

CUDA

May, 2

GPU accelerated Trotter-Suzuki solver for quantum spin dynamics

The resolution of dynamics in out of equilibrium quantum spin systems relies at the heart of fundamental questions among Quantum Information Processing, Statistical Mechanics and Nano-Technologies. Efficient computational simulations of interacting many-spin systems are extremely valuable tools for tackling such questions. Here, we use the Trotter-Suzuki (TS) algorithm, a well-known strategy that provides the evolution […]

CUDA

May, 2

A framework for data-access strategies in GPGPU programs

In recent years, graphics processing units (GPUs) became more and more popular as high performance processing units. Due to the availability of hundreds of cores, code fragments speed up significantly when they are transformed from CPU functions to GPU kernels. The transformation process is non-trivial and therefore error prone. Developing correct and efficient GPU accelerated […]

CUDA

May, 2

Adding GPU Computing to Computer Organization Courses

How can parallel computing topics be incorporated into core courses that are taken by the majority of undergraduate students? This paper reports our experiences adding GPU computing with CUDA into the core undergraduate computer organization course at two different colleges. We have found that even though programming in CUDA is not necessarily easy, programmer control […]

CUDA

May, 2

OpenMP performance analysis for many-core platforms with non-uniform memory access

One of the first steps in embedded-system design flow is to choose the most efficient implementation of the embedded software application. However, this is difficult to do at the earliest design stages because particular details of the final manycore HW platform are usually unknown and many possible mappings of the software tasks/threads have to be […]

OpenCL

May, 1

GPU Acceleration of the Variational Monte Carlo Method for Many Body Physics

High-Performance computing is one of the major areas making inroads into the future for large-scale simulation. Applications such as 3D nuclear test, Molecular Dynamics, and Quantum Monte Carlo simulations are now developed on supercomputers using the latest computing technologies. As per the TOP500 supercomputers rating, most of today’s supercomputers are now heterogeneous: with massively parallel […]

CUDA

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Algorithms for representation of 3D regions in radiotherapy planning software

Optimizing Similarity Computations for Ontology Matching – Experiences from GOMMA

Data-rich astronomy: mining synoptic sky surveys

Impact of Warp Formation on GPU Performance

Feasibility Analysis of Low Cost Graphical Processing Units for Electromagnetic Field Simulations by Finite Difference Time Domain Method

AMD Developer Summit 2013, APU13

Efficient implementation for QUAD stream cipher with GPUs

GPU accelerated Trotter-Suzuki solver for quantum spin dynamics

A framework for data-access strategies in GPGPU programs

Adding GPU Computing to Computer Organization Courses

OpenMP performance analysis for many-core platforms with non-uniform memory access

GPU Acceleration of the Variational Monte Carlo Method for Many Body Physics

Recent source codes

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Most viewed papers (last 30 days)