high performance computing on graphics processing units: hgpu.org

Posts

May, 29

A fight for performance and accuracy of the matrix multiplication routines: CUBLAS on Nvidia Tesla versus MKL and ATLAS on Intel Nehalem

Scientific computation relies heavily on 64 bits arithmetic. The evolution of the Graphical Processing Units to the status of massively micro-parallel vector units and the improvement of their programmability make them stand as powerfull algebraic coprocessors for many classes of matrix calculus. But on these processors inheriting from architectures dedicated to video processing in the […]

CUDA

May, 29

Using OpenCL to Calculate a Pressure Field

This report details the project in converting a CUDA program into an OpenCL program that would be adaptable to many platforms. Originally the CUDA program could only be ran on a NVIDA graphics card, which did not make the program very applicable for the user. Throughout this project the above authors learned how to program […]

CUDA

•

OpenCL

May, 29

Massively Parallel Neural Encoding and Decoding of Visual Stimuli

The massively parallel nature of video Time Encoding Machines (TEMs) calls for scalable, massively parallel decoders that are implemented with neural components. The current generation of decoding algorithms is based on computing the pseudo-inverse of a matrix and does not satisfy these requirements. Here we consider video TEMs with an architecture built using Gabor receptive […]

CUDA

May, 29

Time-dependent density-functional theory in massively parallel computer architectures: the OCTOPUS project

Octopus is a general-purpose density-functional theory (DFT) code, with a particular emphasis on the time-dependent version of DFT (TDDFT). In this paper we present the ongoing efforts to achieve the parallelization of octopus. We focus on the real-time variant of TDDFT, where the time-dependent Kohn-Sham equations are directly propagated in time. This approach has great […]

OpenCL

May, 29

Explicit Cache Management for Volume Ray-Casting on Parallel Architectures

A major challenge when designing general purpose graphics hardware is to allow efficient access to texture data. Although different rendering paradigms vary with respect to their data access patterns, there is no flexibility when it comes to data caching provided by the graphics architecture. In this paper we focus on volume ray-casting, and show the […]

OpenCL

May, 27

The Third International Workshop on Frontier of GPU Computing, FGC 2012

To be held in conjunction with HPCC 2012 The goal of this workshop is to provide a forum for researchers and practitioners to discuss and share their research and development experiences and outputs on the massively parallel GPU platforms, software development tools, optimization techniques, parallel algorithm design, and all kinds of successful applications. We solicit […]

May, 26

Parameterized Verification of GPU Kernel Programs

We present an automated symbolic verifier for checking the functional correctness of GPGPU kernels parametrically, for an arbitrary number of threads. Our tool PUGpara checks the functional equivalence of a kernel and its optimized versions, helping debug errors introduced during memory coalescing and bank conflict elimination related optimizations. Key features of our work include: (1) […]

CUDA

May, 26

Parallel Parametric Optimisation with Firefly Algorithms on Graphical Processing Units

Parametric optimisation techniques such as Particle Swarm Optimisation (PSO), Firefly algorithms (FAs), genetic algorithms (GAs) are at the centre of attention in a range of optimisation problems where local minima plague the parameter space. Variants of these algorithms deal with the problems presented by local minima in a variety of ways. A salient feature in […]

CUDA

May, 26

Routine Microsecond Molecular Dynamics Simulations with AMBER on GPUs. 1. Generalized Born

We present an implementation of generalized Born implicit solvent all-atom classical molecular dynamics (MD) within the AMBER program package that runs entirely on CUDA enabled NVIDIA graphics processing units (GPUs). We discuss the algorithms that are used to exploit the processing power of the GPUs and show the performance that can be achieved in comparison […]

CUDA

May, 26

Fast and accurate digital signal processing realized with GPGPU technology

An idea of the so-called quasi-maximum accuracy computations for improvement of precision of the floating-point digital signal processing with graphic processing units (GPUs) is presented in this paper. In the presented approach, the increase of the precision of computations does not need any increase of the length of the data words. Special attention has been […]

CUDA

May, 26

Parallelization of the Local Threshold and Boolean Function Based Edge Detection Algorithm Using CUDA

In this paper we present a parallelized algorithm for edge detection for gray scale images. The chosen method is the local threshold and boolean function based edge detection. This method differs from common edge detectors in the use of bit map patterns instead of analyzing gradient changes in the image for edge recognition. The parallelization […]

CUDA

May, 25

The Third International Conference on Parallel, Distributed, Grid and Cloud Computing for Engineering, PARENG2013

The conference will consider mathematical, computer science and engineering developments that impact on the use of HPC in engineering analysis, design, and simulation. Engineering is interpreted in its widest sense to include aeronautical, civil, mechanical, electrical, materials, bioengineering, geotechnical, structural and environmental fields. The range of topics considered by the Conference will include: The mathematical […]

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

* * *

high performance computing on graphics processing units: hgpu.org

Posts

A fight for performance and accuracy of the matrix multiplication routines: CUBLAS on Nvidia Tesla versus MKL and ATLAS on Intel Nehalem

Using OpenCL to Calculate a Pressure Field

Massively Parallel Neural Encoding and Decoding of Visual Stimuli

Time-dependent density-functional theory in massively parallel computer architectures: the OCTOPUS project

Explicit Cache Management for Volume Ray-Casting on Parallel Architectures

The Third International Workshop on Frontier of GPU Computing, FGC 2012

Parameterized Verification of GPU Kernel Programs

Parallel Parametric Optimisation with Firefly Algorithms on Graphical Processing Units

Routine Microsecond Molecular Dynamics Simulations with AMBER on GPUs. 1. Generalized Born

Fast and accurate digital signal processing realized with GPGPU technology

Parallelization of the Local Threshold and Boolean Function Based Edge Detection Algorithm Using CUDA

The Third International Conference on Parallel, Distributed, Grid and Cloud Computing for Engineering, PARENG2013

Recent source codes

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Most viewed papers (last 30 days)