high performance computing on graphics processing units: hgpu.org

Posts

Feb, 14

High-Performance Spatial Query Processing on Big Taxi Trip Data using GPGPUs

City-wide GPS recorded taxi trip data contains rich information for traffic and travel analysis to facilitate transportation planning and urban studies. However, traditional data management techniques are largely incapable of processing big taxi trip data at the scale of hundreds of millions. In this study, we aim at utilizing the General Purpose computing on Graphics […]

CUDA

Feb, 14

Porting FEASTFLOW to the Intel Xeon Phi: Lessons Learned

In this paper we report our experiences in porting the FEASTFLOW software infrastructure to the Intel Xeon Phi coprocessor. Our efforts involved both the evaluation of programming models including OpenCL, POSIX threads and OpenMP and typical optimization strategies like parallelization and vectorization. Since the straightforward porting process of the already existing OpenCL version of the […]

OpenCL

Feb, 14

Multi-Kepler GPU vs. Multi-Intel MIC for spin systems simulations

We present and compare the performances of two many-core architectures: the Nvidia Kepler and the Intel MIC both in a single system and in cluster configuration for the simulation of spin systems. As a benchmark we consider the time required to update a single spin of the 3D Heisenberg spin glass model by using the […]

CUDA

Feb, 12

Multi-tier Dynamic Vectorization for Translating GPU Optimizations into CPU Performance

Developing high performance GPU code is labor intensive. Ideally, developers could recoup high GPU development costs by generating high-performance programs for CPUs and other architectures from the same source code. However, current OpenCL compilers for non-GPUs do not fully exploit optimizations in well-tuned GPU codes. To address this problem, we develop an OpenCL implementation that […]

OpenCL

Feb, 12

Increasing precision of uniform pseudorandom number generators

A general method to produce uniformly distributed pseudorandom numbers with extended precision by combining two pseudorandom numbers with lower precision is proposed. In particular, this method can be used for pseudorandom number generation with extended precision on graphics processing units (GPU), where the performance of single and double precision operations can vary significantly.

Feb, 12

Designing Bit-Reproducible Portable High-Performance Applications

Bit-reproducibility has many advantages in the context of high-performance computing. Besides simplifying and making more accurate the process of debugging and testing the code, it can allow the deploying of applications on heterogeneous systems, maintaining the consistency of the computations. In this work we analyze the basic operations performed by scientific applications and identify the […]

CUDA

Feb, 12

GROMACS on Hybrid CPU-GPU and CPU-MIC Clusters: Preliminary Porting Experiences, Results and Next Steps

This report introduces hybrid implementation of the Gromacs application, and provides instructions on building and executing on PRACE prototype platforms with Graphical Processing Units (GPU) and Many Intergrated Cores (MIC) accelerator technologies. GROMACS currently employs message-passing MPI parallelism, multi-threading using OpenMP and contains kernels for non-bonded interactions that are accelerated using the CUDA programming language. […]

CUDA

•

OpenCL

Feb, 12

Transparent use of Java objects on the GPU in the JaMP/OpenMP framework

Many computationally intensive applications profit by parallel execution, based on using multiple cores in CPUs, data-parallel GPGPU processing or even several machines like in clusters. However, changing a program to run in parallel requires a high effort and is therefore a time-consuming step during development. During the implementation, it is necessary to consider many steps […]

CUDA

Feb, 12

Minerals detection for hyperspectral images using adapted linear unmixing: LinMin

Minerals detection over large volume of spectra is the challenge addressed by current hyperspectral imaging spectrometer in Planetary Science. Instruments such OMEGA (Mars Express), CRISM (Mars Reconnaissance Orbiter), M^{3} (Chandrayaan-1), VIRTIS (Rosetta) and many more, have been producing very large datasets since one decade. We propose here a fast supervised detection algorithm called LinMin, in […]

CUDA

Feb, 12

Yang-Mills lattice on CUDA

The Yang-Mills fields have an important role in the non-Abelian gauge field theory which describes the properties of the quark-gluon plasma. The real time evolution of the classical fields is given by the equations of motion which are derived from the Hamiltonians to contain the term of the SU(2) gauge field tensor. The dynamics of […]

CUDA

Feb, 11

A scheduling and runtime framework for a cluster of heterogeneous machines with multiple accelerators

We present a system that enables simple and intuitive programming of CPU+GPU clusters. This system relieves the programmer of the burden of load balancing, detailed data communication, task mapping, scheduling, etc. Our programming model is based on bulk synchronous distributed shared memory model, which is suitable for heterogenous multi-GPU clusters, especially so for compute intensive […]

CUDA

Feb, 11

Confidentiality Issues on a GPU in a Virtualized Environment

General-Purpose computing on Graphics Processing Units (GPGPU) combined to cloud computing is already a commercial success. However, there is little literature that investigates its security implications. Our objective is to highlight possible information leakage due to GPUs in virtualized and cloud computing environments. We provide insight into the different GPU virtualization techniques, along with their […]

CUDA

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

* * *

high performance computing on graphics processing units: hgpu.org

Posts

High-Performance Spatial Query Processing on Big Taxi Trip Data using GPGPUs

Porting FEASTFLOW to the Intel Xeon Phi: Lessons Learned

Multi-Kepler GPU vs. Multi-Intel MIC for spin systems simulations

Multi-tier Dynamic Vectorization for Translating GPU Optimizations into CPU Performance

Increasing precision of uniform pseudorandom number generators

Designing Bit-Reproducible Portable High-Performance Applications

GROMACS on Hybrid CPU-GPU and CPU-MIC Clusters: Preliminary Porting Experiences, Results and Next Steps

Transparent use of Java objects on the GPU in the JaMP/OpenMP framework

Minerals detection for hyperspectral images using adapted linear unmixing: LinMin

Yang-Mills lattice on CUDA

A scheduling and runtime framework for a cluster of heterogeneous machines with multiple accelerators

Confidentiality Issues on a GPU in a Virtualized Environment

Recent source codes

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Most viewed papers (last 30 days)