high performance computing on graphics processing units: hgpu.org

Posts

Feb, 10

gearshifft – The FFT Benchmark Suite for Heterogeneous Platforms

Fast Fourier Transforms (FFTs) are exploited in a wide variety of fields ranging from computer science to natural sciences and engineering. With the rising data production bandwidths of modern FFT applications, judging best which algorithmic tool to apply, can be vital to any scientific endeavor. As tailored FFT implementations exist for an ever increasing variety […]

OpenCL

Feb, 10

Acceleration of low-latency gravitational wave searches using Maxwell-microarchitecture GPUs

Low-latency detections of gravitational waves (GWs) are crucial to enable prompt follow-up observations to astrophysical transients by conventional telescopes. We have developed a low-latency pipeline using a technique called Summed Parallel Infinite Impulse Response (SPIIR) filtering, realized by a Graphic Processing Unit (GPU). In this paper, we exploit the new Maxwell memory access architecture in […]

CUDA

Feb, 10

Backpropagation Training for Fisher Vectors within Neural Networks

Fisher-Vectors (FV) encode higher-order statistics of a set of multiple local descriptors like SIFT features. They already show good performance in combination with shallow learning architectures on visual recognitions tasks. Current methods using FV as a feature descriptor in deep architectures assume that all original input features are static. We propose a framework to jointly […]

CUDA

Feb, 7

CFP: The 2017 International Workshop on Embedded Multicore Systems (ICPPEMS), 2017

The 2017 International Workshop on Embedded Multicore Systems to be held in conjunction with the 46th International Conference on Parallel Processing (ICPP 2017) https://sites.google.com/view/icppems2017 Embedded systems with multicore designs are of major focuses from both industry and academia. While embedded multicore systems will look to play an important role ahead for system designs, many challenging […]

Feb, 7

Machines and Algorithms

I discuss the evolution of computer architectures with a focus on QCD and with reference to the interplay between architecture, engineering, data motion and algorithms. New architectures are discussed and recent performance results are displayed. I also review recent progress in multilevel solver and integation algorithms.

CUDA

Feb, 7

Recurrent Neural Networks for anomaly detection in the Post-Mortem time series of LHC superconducting magnets

This paper presents a model based on Deep Learning algorithms of LSTM and GRU for facilitating an anomaly detection in Large Hadron Collider superconducting magnets. We used high resolution data available in Post Mortem database to train a set of models and chose the best possible set of their hyper-parameters. Using Deep Learning approach allowed […]

CUDA

Feb, 7

Introduction to the Special Issue on Digital Signal Processing in Radio Astronomy

Advances in astronomy are intimately linked to advances in digital signal processing (DSP). This special issue is focused upon advances in DSP within radio astronomy. The trend within that community is to use off-the-shelf digital hardware where possible and leverage advances in high performance computing. In particular, graphics processing units (GPUs) and field programmable gate […]

CUDA

Feb, 7

A Dynamic Programming Model To Solve Optimisation Problems Using GPUs

This thesis presents a parallel, dynamic programming based model which is deployed on the GPU of a system to accelerate the solving of optimisation problems. This is achieved by simultaneously running GPU based computations, and memory transactions, allowing computation to never pause, and overcoming the memory constraints of solving large problem instances. Due to this […]

CUDA

Feb, 7

Advanced Concurrency Control Algorithm Design and GPU System Support for High Performance In-Memory Data Management

The design and implementation of data management systems have been significantly affected by application demands and hardware advancements. On one hand, with the emerging of various new applications, the traditional one-size-fits-all data management system has evolved into domain specific systems optimized for each application (e.g., OLTP, OLAP, streaming, etc.). On the other hand, with increasing […]

CUDA

•

OpenCL

Feb, 5

Spark-GPU: An Accelerated In-Memory Data Processing Engine on Clusters

Apache Spark is an in-memory data processing system that supports both SQL queries and advanced analytics over large data sets. In this paper, we present our design and implementation of Spark-GPU that enables Spark to utilize GPU’s massively parallel processing ability to achieve both high performance and high throughput. Spark-GPU transforms a general-purpose data processing […]

CUDA

Feb, 5

Critical Comparison of the Classification Ability of Deep Convolutional Neural Network Frameworks with Support Vector Machine Techniques in the Image Classification Process

Recently, a number of new image classification models have been developed to diversify the number of options available to prospective machine learning classifiers, such as Deep Learning. This is particularly important in the field of medical image classification as a misdiagnosis could have a severe impact on the patient. However, an assessment on the level […]

Feb, 5

Fast Fourier Transforms over Prime Fields of Large Characteristic and their Implementation on Graphics Processing Units

Prime field arithmetic plays a central role in computer algebra and supports computation in Galois fields which are essential to coding theory and cryptography algorithms. The prime fields that are used in computer algebra systems, in particular in the implementation of modular methods, are often of small characteristic, that is, based on prime numbers that […]

CUDA

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

chemtrain-deploy: A parallel and scalable framework for machine learning potentials in million-atom MD simulations

microSYCL: SYCL micro-benchmarks repository

Exploring SYCL as a Portability Layer for High-Performance Computing on CPUs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Posts

gearshifft – The FFT Benchmark Suite for Heterogeneous Platforms

Acceleration of low-latency gravitational wave searches using Maxwell-microarchitecture GPUs

Backpropagation Training for Fisher Vectors within Neural Networks

CFP: The 2017 International Workshop on Embedded Multicore Systems (ICPPEMS), 2017

Machines and Algorithms

Recurrent Neural Networks for anomaly detection in the Post-Mortem time series of LHC superconducting magnets

Introduction to the Special Issue on Digital Signal Processing in Radio Astronomy

A Dynamic Programming Model To Solve Optimisation Problems Using GPUs

Advanced Concurrency Control Algorithm Design and GPU System Support for High Performance In-Memory Data Management

Spark-GPU: An Accelerated In-Memory Data Processing Engine on Clusters

Critical Comparison of the Classification Ability of Deep Convolutional Neural Network Frameworks with Support Vector Machine Techniques in the Image Classification Process

Fast Fourier Transforms over Prime Fields of Large Characteristic and their Implementation on Graphics Processing Units

Recent source codes

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

Most viewed papers (last 30 days)