high performance computing on graphics processing units: hgpu.org

Posts

Sep, 17

GPU Accelerated Parallel Iris Segmentation

A biometric system provides automatic identification of an individual based on a unique feature or characteristic possessed by the person. Iris recognition systems are the most definitive biometric system since complex random iris patterns are unique to each individual and do not change with time. Iris Recognition is basically divided into three steps, namely, Iris […]

CUDA

Sep, 17

High Performance Client-Side Web Programming with SPOC and Js_of_ocaml

We present WebSpoc, an OCaml GPGPU library targeting web applications that is built upon SPOC and js_of_ocaml. SPOC is an OCaml GPGPU library focusing on abstracting memory transfers, handling GPGPU computations and offering easy portability. Js_of_ocaml is the OCaml byte-code to JavaScript compiler. Thus, WebSpoc provides high performance computations from the web browser while benefiting […]

OpenCL

Sep, 17

GPU Implementation of a Deep Learning Network for Financial Prediction

Machine learning, a branch of artificial intelligence, concerns the construction and study of systems that can learn from data. Neural network is the well-known branch of machine learning & it has been used extensively by researchers for prediction of data and the prediction accuracy depends upon fine tuning of particular financial data. In this paper […]

OpenCL

Sep, 17

Using hybrid GPU/CPU kernel splitting to accelerate spherical convolutions

We present a general method for accelerating by more than an order of magnitude the convolution of pixelated functions on the sphere with a radially-symmetric kernel. Our method splits the kernel into a compact real-space component and a compact spherical harmonic space component. These components can then be convolved in parallel using an inexpensive commodity […]

CUDA

Sep, 16

The GPUVerify Method: a Tutorial Overview

I present a tutorial overview demonstrating the key technique used by GPUVerify, a static verification tool for graphics processing unit (GPU) kernels. The technique is a method for translating a massively parallel GPU kernel into a sequential program such that correctness of the sequential program implies data race-freedom of the parallel kernel.

CUDA

•

OpenCL

Sep, 16

Distance Threshold Similarity Searches on Spatiotemporal Trajectories using GPGPU

The processing of moving object trajectories arises in many application domains. We focus on a trajectory similarity search, the distance threshold search, which finds all trajectories within a given distance of a query trajectory over a time interval. A multithreaded CPU implementation that makes use of an in-memory R-tree index can achieve high parallel efficiency. […]

OpenCL

Sep, 16

High-accuracy Optimization by Parallel Iterative Discrete Approximation and GPU Cluster Computing

High-accuracy optimization is the key component of time-sensitive applications in computer sciences such as machine learning, and we develop single-GPU Iterative Discrete Approximation Monte Carlo Optimization (IDA-MCS) and multi-GPU IDA-MCS in our previous research. However, because of the memory capability constrain of GPUs in a workstation, single-GPU IDA-MCS and multi-GPU IDA-MCS may be in low […]

CUDA

Sep, 16

Performance analysis of a 240 thread tournament level MCTS Go program on the Intel Xeon Phi

In 2013 Intel introduced the Xeon Phi, a new parallel co-processor board. The Xeon Phi is a cache-coherent many-core shared memory architecture claiming CPU-like versatility, programmability, high performance, and power efficiency. The first published micro-benchmark studies indicate that many of Intel’s claims appear to be true. The current paper is the first study on the […]

Sep, 16

Machine learning for ultrafast X-ray diffraction patterns on large-scale GPU clusters

The classical method of determining the atomic structure of complex molecules by analyzing diffraction patterns is currently undergoing drastic developments. Modern techniques for producing extremely bright and coherent X-ray lasers allow a beam of streaming particles to be intercepted and hit by an ultrashort high energy X-ray beam. Through machine learning methods the data thus […]

CUDA

Sep, 15

Exploratory Data Analysis of Software Repositories via GPU Processing

Analyzing software repositories with thousands of artifacts is data intensive, which makes interactive exploration analysis of such data infeasible. We introduce a novel approach, Dominoes, that can support automated exploration of relationships amongst project elements, where users have the flexibility to explore on the fly the numerous types of project relationships. Dominoes organizes data extracted […]

CUDA

Sep, 15

Interactive Wave Simulations

Simulation of ocean waves can be categorized into two major groups. First one is based on the physical models whereas the other generates the ocean waves based on either geometrical shapes or oceanography spectrums. Even though the later method group requires less computational effort, the waves modelled are less realistic in nature. Currently MARIN (Maritime […]

CUDA

Sep, 15

Scalable Multi-GPU Simulation of Long-Range Molecular Dynamics

Molecular dynamics simulations allow us to study the behavior of complex biomolecular systems by modeling the pairwise interaction forces between all atoms. Molecular systems are subject to slowly decaying electrostatic potentials, which turn molecular dynamics into an n-body problem. In this paper, we present a parallel and scalable solution to compute long-range molecular forces, based […]

CUDA

high performance computing on graphics processing units: hgpu.org

Posts

GPU Accelerated Parallel Iris Segmentation

High Performance Client-Side Web Programming with SPOC and Js_of_ocaml

GPU Implementation of a Deep Learning Network for Financial Prediction

Using hybrid GPU/CPU kernel splitting to accelerate spherical convolutions

The GPUVerify Method: a Tutorial Overview

Distance Threshold Similarity Searches on Spatiotemporal Trajectories using GPGPU

High-accuracy Optimization by Parallel Iterative Discrete Approximation and GPU Cluster Computing

Performance analysis of a 240 thread tournament level MCTS Go program on the Intel Xeon Phi

Machine learning for ultrafast X-ray diffraction patterns on large-scale GPU clusters

Exploratory Data Analysis of Software Repositories via GPU Processing

Interactive Wave Simulations

Scalable Multi-GPU Simulation of Long-Range Molecular Dynamics

Recent source codes

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)