high performance computing on graphics processing units: hgpu.org

Posts

Jul, 1

Generating Efficient Data Movement Code for Heterogeneous Architectures with Distributed-Memory

Programming for parallel architectures that do not have a shared address space is extremely difficult due to the need for explicit communication between memories of different compute devices. A heterogeneous system with CPUs and multiple GPUs, or a distributed-memory cluster are examples of such systems. Past works that try to automate data movement for distributed-memory […]

Jul, 1

Exploiting multi-level parallelism in streaming applications for heterogeneous platforms with GPUs

Heterogeneous computing platforms support the traditional types of parallelism, such as e.g., instruction-level, data, task, and pipeline parallelism, and provide the opportunity to exploit a combination of different types of parallelism at different platform levels. The architectural diversity of platform components makes tapping into the platform potential a challenging programming task. This thesis makes an […]

CUDA

Jul, 1

Towards Performance-Portable, Scalable, and Convenient Linear Algebra

The rise of multi- and many-core architectures also gave birth to a plethora of new parallel programming models. Among these, the open industry standard OpenCL addresses this heterogeneity of programming environments by providing a unified programming framework. The price to pay, however, is that OpenCL requires additional low-level boilerplate code, when compared to vendor-specific solutions, […]

OpenCL

Jun, 30

Cropped Quad-Tree Based Solid Object Colouring with CUDA

In this study, surfaces of solid objects are coloured with Cropped Quad-Tree method utilizing GPU computing optimization. There are numerous methods used in solid object colouring. When the studies carried out in different fields are taken into consideration, it is seen that quad-tree method displays a prominent position in terms of speed and performance. Cropped […]

CUDA

•

OpenGL

Jun, 30

Accelerating SELECT WHERE and SELECT JOIN Queries on a GPU

This paper presents implementations of a few selected SQL operations using the CUDA programming framework on the GPU platform. Nowadays, the GPU’s parallel architectures give a high speed-up on certain problems. Therefore, the number of non-graphical problems that can be run and sped-up on the GPU still increases. Especially, there has been a lot of […]

CUDA

Jun, 30

HadoopCL: MapReduce on Distributed Heterogeneous Platforms Through Seamless Integration of Hadoop and OpenCL

As the scale of high performance computing systems grows, three main challenges arise: the programmability, reliability, and energy efficiency of those systems. Accomplishing all three without sacrificing performance requires a rethinking of legacy distributed programming models and homogeneous clusters. In this work, we integrate Hadoop MapReduce with OpenCL to enable the use of heterogeneous processors […]

OpenCL

Jun, 30

Intel Xeon Phi Coprocessor High-Performance Programming

This book is useful even before you ever touch a system with an Intel Xeon Phi coprocessor. To ensure that your applications run at maximum efficiency, the authors emphasize key techniques for programming any modern parallel computing system whether based on Intel Xeon processors, Intel Xeon Phi coprocessors, or other high performance microprocessors. Applying these […]

Jun, 30

Best Practice Guide – Intel Xeon Phi

This best practice guide provides information about Intel’s MIC architecture and programming models for the Intel Xeon Phi coprocessor in order to enable programmers to achieve good performance of their applications. The guide covers a wide range of topics from the description of the hardware of the Intel Xeon Phi coprocessor through information about the […]

Jun, 29

A model of dynamic compilation for heterogeneous compute platforms

Trends in computer engineering place renewed emphasis on increasing parallelism and heterogeneity. The rise of parallelism adds an additional dimension to the challenge of portability, as different processors support different notions of parallelism, whether vector parallelism executing in a few threads on multicore CPUs or large-scale thread hierarchies on GPUs. Thus, software experiences obstacles to […]

CUDA

•

OpenCL

Jun, 29

Adaptation of algorithms for underwater sonar data processing to GPU-based systems

In this master thesis, algorithms for acoustic simulations in underwater environments are ported for GPU processing. The GPU parallel computing platforms used are CUDA, OpenCL and SkePU. The purpose of this master thesis is to adapt and evaluate the ported algorithms’ performance on two modern NVIDIA GPUs, Tesla K20 and Quadro K5000. Several optimizations, described […]

CUDA

•

OpenCL

Jun, 29

Hinomiyagura Infrastructure Competiton TDP: Platform of rescue simulation using GPGPU

We propose a new platform that consists of new traffic simulator and scenario generator. The traffic simulation system using GPGPU that enables to simulate rescue and evacuation simulation with more agents and faster than the present system. And it can simulate agents’ motions in a three-dimensional map. Our proposal provides a platform to widen the […]

OpenCL

Jun, 29

Betatron tune measurement with the LHC damper using a GPU

This thesis studies a possible future implementation of a betatron tune measurement in the Large Hadron Collider (LHC) at European organization for nuclear research (CERN) using a General Purpose Graphic Processing Unit (GPGPU) to analyse data acquired with the LHC transverse transverse damper (ADT). The present hardware and future possible implementations using ADT acquisitions and […]

CUDA

high performance computing on graphics processing units: hgpu.org

Posts

Generating Efficient Data Movement Code for Heterogeneous Architectures with Distributed-Memory

Exploiting multi-level parallelism in streaming applications for heterogeneous platforms with GPUs

Towards Performance-Portable, Scalable, and Convenient Linear Algebra

Cropped Quad-Tree Based Solid Object Colouring with CUDA

Accelerating SELECT WHERE and SELECT JOIN Queries on a GPU

HadoopCL: MapReduce on Distributed Heterogeneous Platforms Through Seamless Integration of Hadoop and OpenCL

Intel Xeon Phi Coprocessor High-Performance Programming

Best Practice Guide – Intel Xeon Phi

A model of dynamic compilation for heterogeneous compute platforms

Adaptation of algorithms for underwater sonar data processing to GPU-based systems

Hinomiyagura Infrastructure Competiton TDP: Platform of rescue simulation using GPGPU

Betatron tune measurement with the LHC damper using a GPU

Recent source codes

OpScanner

Atlas CLI: Machine Learning (ML) Lifecycle & Transparency Manager

transformers_tvm: Implementation of Encoder Decoder transformer on TVM

INT v.s. FP: A framework to compare low-bit integer and float-point formats

AutoDock-GPU: AutoDock for GPUs and other accelerators

NCCLX: collective communication framework

Tutoring LLM into a Better CUDA Optimizer

Adaptivity in AdaptiveCpp: Optimizing Performance by Leveraging Runtime Information During JIT-Compilation

Kernel Library for LLM Serving

Neptune: Advanced ML Operator Fusion for Locality and Parallelism on GPUs

Most viewed papers (last 30 days)