high performance computing on graphics processing units: hgpu.org

Posts

Nov, 13

Utilizing massive parallelism in decoding of modern error-correcting codes for accelerating communication systems simulations

In this paper a novel approximate algorithm for massively-parallel decoding of trellis based error correcting codes (ECC) is presented. The potential effect of using such optimized decoder on acceleration of simulations of modern communication systems implementing the most recent communication standards, such as LTE-A (Long Term Evolution – Advanced) is evaluated quantitatively by presenting an […]

CUDA

Nov, 13

GPU Enhancement of the Trigger to Extend Physics Reach at the Large Hadron Collider

At the Large Hadron Collider (LHC), the trigger systems for the detectors must be able to process a very large amount of data in a very limited amount of time, so that the nominal collision rate of 40 MHz can be reduced to a data rate that can be stored and processed in a reasonable […]

CUDA

Nov, 13

Lattice Simulations using OpenACC compilers

OpenACC compilers allow one to use Graphics Processing Units without having to write explicit CUDA codes. Programs can be modified incrementally using OpenMP like directives which causes the compiler to generate CUDA kernels to be run on the GPUs. In this article we look at the performance gain in lattice simulations with dynamical fermions using […]

CUDA

Nov, 13

Indexing million of packets per second using GPUs

Network traffic recorders are devices that record massive volumes of network traffic for security applications, like retrospective forensic investigations. When deployed over very high-speed networks, traffic recorders must process and store millions of packets per second. To enable interactive explorations of such large traffic archives, packet indexing mechanisms are required. Indexing packets at wire rates […]

Nov, 12

Sailfish: a flexible multi-GPU implementation of the lattice Boltzmann method

We present Sailfish, an open source fluid simulation package implementing the lattice Boltzmann method (LBM) on modern Graphics Processing Units (GPUs) using CUDA/OpenCL. We take a novel approach to GPU code implementation and use run-time code generation techniques and a high level programming language (Python) to achieve state of the art performance, while allowing easy […]

CUDA

•

OpenCL

Nov, 12

High speed cipher cracking: the case of Keeloq on CUDA

Graphic Processing Units (GPU) are increasingly popular in the field of high-performance computing for their ability to provide computational power for massively parallel problems at a reduced cost. However, the programming model exposed by the GPGPU software development tools is often insufficient to achieve full performance, and a major rethinking of algorithmic choices is needed. […]

CUDA

Nov, 12

A Hybrid GPU/CPU FFT Library for Large FFT Problems

Graphic Processing Units (GPU) has been proved to be a promising platform to accelerate large size Fast Fourier Transform (FFT) computation. However, current GPU-based FFT implementation only uses GPU to compute, but employs CPU as a mere memory-transfer controller. The computation power in today’s high-performance CPU is wasted. In this project, a hybrid optimization framework […]

CUDA

Nov, 12

Performance Evaluation of R with Intel Xeon Phi Coprocessor

Over the years, R has been adopted as a major data analysis and mining tool in many domain fields. As Big Data overwhelms those fields, the computational needs and workload of existing R solutions increases significantly. With recent hardware and software developments, it is possible to enable massive parallelism with existing R solutions with little […]

CUDA

Nov, 12

GPU-Based Sparse Voxel Octree Raytracing for Rendering of Procedurally Generated Terrain

Within the field of Computer Graphics, there have been two competing approaches to doing rendering, namely rasterisation and raytracing. Rasterisation became, and has been, the dominant of the two methods for realtime rendering for a long period of time. With recent developments in graphics hardware, however, raytracing is starting to gain popularity once again. At […]

CUDA

Nov, 11

Accelerating calculations of RNA secondary structure partition functions using GPUs

BACKGROUND: RNA performs many diverse functions in the cell in addition to its role as a messenger of genetic information. These functions depend on its ability to fold to a unique three-dimensional structure determined by the sequence. The conformation of RNA is in part determined by its secondary structure, or the particular set of contacts […]

CUDA

Nov, 11

Preliminary Experiments with XKaapi on Intel Xeon Phi Coprocessor

This paper presents preliminary performance comparisons of parallel applications developed natively for the Intel Xeon Phi accelerator using three different parallel programming environments and their associated runtime systems. We compare Intel OpenMP, Intel CilkPlus and XKaapi together on the same benchmark suite and we provide comparisons between an Intel Xeon Phi coprocessor and a Sandy […]

Nov, 11

Explorations of the Viability of ARM and Xeon Phi for Physics Processing

We report on our investigations into the viability of the ARM processor and the Intel Xeon Phi co-processor for scientific computing. We describe our experience porting software to these processors and running benchmarks using real physics applications to explore the potential of these processors for production physics processing.

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Utilizing massive parallelism in decoding of modern error-correcting codes for accelerating communication systems simulations

GPU Enhancement of the Trigger to Extend Physics Reach at the Large Hadron Collider

Lattice Simulations using OpenACC compilers

Indexing million of packets per second using GPUs

Sailfish: a flexible multi-GPU implementation of the lattice Boltzmann method

High speed cipher cracking: the case of Keeloq on CUDA

A Hybrid GPU/CPU FFT Library for Large FFT Problems

Performance Evaluation of R with Intel Xeon Phi Coprocessor

GPU-Based Sparse Voxel Octree Raytracing for Rendering of Procedurally Generated Terrain

Accelerating calculations of RNA secondary structure partition functions using GPUs

Preliminary Experiments with XKaapi on Intel Xeon Phi Coprocessor

Explorations of the Viability of ARM and Xeon Phi for Physics Processing

Recent source codes

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Most viewed papers (last 30 days)