high performance computing on graphics processing units: hgpu.org

Posts

Nov, 22

List Mode PET reconstruction

PET technology has an important role in modern medical diagnostics. With this process we can view snapshot of a given part of the body’s metabolism which provides more information than examining the organ’s anatomy. Processing list mode measurement data is a demanding task, to solve this problem we use GPU which provides the necessary parallel […]

CUDA

Nov, 21

Early evaluation of directive-based GPU programming models for productive exascale computing

Graphics Processing Unit (GPU)-based parallel computer architectures have shown increased popularity as a building block for high performance computing, and possibly for future Exascale computing. However, their programming complexity remains as a major hurdle for their widespread adoption. To provide better abstractions for programming GPU architectures, researchers and vendors have proposed several directive-based GPU programming […]

CUDA

Nov, 21

High-Performance General Solver for Extremely Large-Scale Semidefinite Programming Problems

Semidefinite programming (SDP) is one of the most important problems among optimization problems at present. It is relevant to a wide range of fields such as combinatorial optimization, structural optimization, control theory, economics, quantum chemistry, sensor network location and data mining. The capability to solve extremely large-scale SDP problems will have a significant effect on […]

CUDA

Nov, 21

ValuePack: value-based scheduling framework for CPU-GPU clusters

Heterogeneous computing nodes are becoming commonplace today, and recent trends strongly indicate that clusters, supercomputers, and cloud environments will increasingly host more heterogeneous resources, with some being massively parallel (e.g., GPU). With such heterogeneous environments becoming common, it is important to revisit scheduling problems for clusters and cloud environments. In this paper, we formulate and […]

CUDA

Nov, 21

An Adaptive Multiresolution Mesh Representation for CPU-GPU Coupled Computation

In this paper, we present an adaptive multiresolution mesh representation exploring the computational differences of the CPU and the GPU. We build our representation considering a dense-polygon mesh simplified to a base mesh which stores the original geometry by means of an atlas structure. For both simplification and refinement processes, we present a hierarchical method […]

OpenGL

Nov, 21

Pattern Recognition with OpenCL Heterogeneous Platform

OpenCL platform provides unified development environment for various multicore processors. In this paper, we evaluate the OpenCL framework for application in pattern recognition. We have selected the most common algorithm for Artificial Neural Networks (ANN) training – the backpropagation algorithm for parallelization with OpenCL because of its high demand for processing resources. We will show […]

OpenCL

Nov, 20

Dataflow-driven GPU performance projection for multi-kernel transformations

Applications often have a sequence of parallel operations to be offloaded to graphics processors; each operation can become an individual GPU kernel. Developers typically explore a variety of transformations for each kernel. Furthermore, it is well known that efficient data management is critical in achieving high GPU performance and that "fusing" multiple kernels into one […]

CUDA

Nov, 20

Accelerating MapReduce on a coupled CPU-GPU architecture

The work presented here is driven by two observations. First, heterogeneous architectures that integrate a CPU and a GPU on the same chip are emerging, and hold much promise for supporting power-efficient and scalable high performance computing. Second, MapReduce has emerged as a suitable framework for simplified parallel application development for many classes of applications, […]

OpenCL

Nov, 20

A scalable, numerically stable, high-performance tridiagonal solver using GPUs

In this paper, we present a scalable, numerically stable, high-performance tridiagonal solver. The solver is based on the SPIKE algorithm for partitioning a large matrix into small independent matrices, which can be solved in parallel. For each small matrix, our solver applies a general 1-by-1 or 2-by-2 diagonal pivoting algorithm, which is also known to […]

CUDA

Nov, 20

MPC Toolbox with GPU Accelerated Optimization Algorithms

The introduction of Graphical Processing Units (GPUs) in scientific computing has shown great promise in many different fields. While GPUs are capable of very high floating point performance and memory bandwidth, its massively parallel architecture requires algorithms to be reimplemented to suit the different architecture. Interior point method can be used to solve convex optimization […]

CUDA

Nov, 20

Krylov Subspace Accelerated Algebraic Multigrid for Mimetic Finite Differences on GPUs

The topic of this thesis is GPU accelerated sparse linear algebra for subsurface reservoir modeling. Numerical techniques for reservoir sim- ulations are described and we present the open source reservoir simulation software toolbox MRST. We discuss some of the challenges related to linear algebra and reservoir simulation. Furthermore, we discuss the possibility GPU-acceleraing the linear […]

CUDA

Nov, 19

CUDA-enabled Optimisation of Technical Analysis Parameters

The optimisation of Technical Trading parameters is a computationally intensive exercise. Models comprising a modest number of Technical Indicators require many thousands of simulations to be executed over a sample period of data, with the best performing sets of parameters employed to generate future trading signals. The purpose of this research is to investigate the […]

CUDA

high performance computing on graphics processing units: hgpu.org

Posts

List Mode PET reconstruction

Early evaluation of directive-based GPU programming models for productive exascale computing

High-Performance General Solver for Extremely Large-Scale Semidefinite Programming Problems

ValuePack: value-based scheduling framework for CPU-GPU clusters

An Adaptive Multiresolution Mesh Representation for CPU-GPU Coupled Computation

Pattern Recognition with OpenCL Heterogeneous Platform

Dataflow-driven GPU performance projection for multi-kernel transformations

Accelerating MapReduce on a coupled CPU-GPU architecture

A scalable, numerically stable, high-performance tridiagonal solver using GPUs

MPC Toolbox with GPU Accelerated Optimization Algorithms

Krylov Subspace Accelerated Algebraic Multigrid for Mimetic Finite Differences on GPUs

CUDA-enabled Optimisation of Technical Analysis Parameters

Recent source codes

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)