high performance computing on graphics processing units: hgpu.org

Posts

Oct, 25

Modern Gyrokinetic Particle-In-Cell Simulation of Fusion Plasmas on Top Supercomputers

The Gyrokinetic Toroidal Code at Princeton (GTC-P) is a highly scalable and portable particle-in-cell (PIC) code. It solves the 5D Vlasov-Poisson equation featuring efficient utilization of modern parallel computer architectures at the petascale and beyond. Motivated by the goal of developing a modern code capable of dealing with the physics challenge of increasing problem size […]

CUDA

Oct, 25

Join Algorithms on GPUs: A Revisit After Seven Years

Implementing database operations on parallel platforms has gain a lot of momentum in the past decade. A number of studies have shown the potential of using GPUs to speed up database operations. In this paper, we present empirical evaluations of a state-of-the-art work published in SIGMOD’08 on GPU-based join processing. In particular, such work provides […]

CUDA

Oct, 22

Sequential Code Parallelization for Multi-core Embedded Systems: A Survey of Models, Algorithms and Tools

In recent years the industry experienced a shift in the design and manufacture of processors. Multiple-core processors in one single chip started replacing the common used single-core processors. This design trend reached the develop of System-on-Chip, widely used in embedded systems, and turned them into powerful Multiprocessor System-on-Chip. These multi-core systems have presented not only […]

Oct, 22

A linguistic approach to concurrent, distributed, and adaptive programming across heterogeneous platforms

Two major trends in computing hardware during the last decade have been an increase in the number of processing cores found in individual computer hardware platforms and an ubiquity of distributed, heterogeneous systems. Together, these changes can improve not only the performance of a range of applications, but the types of applications that can be […]

OpenCL

Oct, 22

Stadium Hashing: Scalable and Flexible Hashing on GPUs

Hashing is one of the most fundamental operations that provides a means for a program to obtain fast access to large amounts of data. Despite the emergence of GPUs as many-threaded general purpose processors, high performance parallel data hashing solutions for GPUs are yet to receive adequate attention. Existing hashing solutions for GPUs not only […]

CUDA

Oct, 22

BLASX: A High Performance Level-3 BLAS Library for Heterogeneous Multi-GPU Computing

Basic Linear Algebra Subprograms (BLAS) are a set of low level linear algebra kernels widely adopted by applications involved with the deep learning and scientific computing. The massive and economic computing power brought forth by the emerging GPU architectures drives interest in implementation of compute-intensive level 3 BLAS on multi-GPU systems. In this paper, we […]

CUDA

Oct, 22

Neurokernel: An Open Source Platform for Emulating the Fruit Fly Brain

We have developed an open software platform called Neurokernel for collaborative development of comprehensive models of the brain of the fruit fly Drosophila melanogaster and their execution and testing on multiple Graphics Processing Units (GPUs). Neurokernel provides a programming model that capitalizes upon the structural organization of the fly brain into a fixed number of […]

CUDA

Oct, 18

MetaFork: A Compilation Framework for Concurrency Models Targeting Hardware Accelerators and Its Application to the Generation of Parametric CUDA Kernels

In this paper, we present the accelerator model of MetaFork together with the software framework that allows automatic generation of CUDA code from annotated MetaFork programs. One of the key features of this CUDA code generator is that it supports the generation of CUDA kernel code where program parameters (like number of threads per block) […]

CUDA

Oct, 18

Self-Adapting Parallel Framework for Long-Term Object Tracking

Object tracking is a crucial field in computer vision that has many uses in human-computer interaction, security and surveillance, video communication and compression, augmented reality, traffic control, etc. Many implementations are introduced in practice, and yet recent methods emphasize on tracking objects adaptively by learning the object’s perspectives and rediscovering it when it becomes untraceable, […]

OpenCL

Oct, 18

Implementation of a Power Efficient Synthetic Aperture Radar Back Projection Algorithm on FPGAs Using OpenCL

In this thesis, an implementation of a Synthetic Aperture Radar (SAR) back projection algorithm onto a Field-Programmable Gate Array (FPGA) device using Open Computing Language (OpenCL) is developed. SAR back projection is a method to form a high-resolution terrain image from radar data. SAR is used in many applications such as Geographic Information Systems (GIS), […]

OpenCL

Oct, 18

A Network Intrusion Detection System Framework based on Hadoop and GPGPU

In IT industry the business data grows exponentially, which results in concern to enhance the security system by implementing effective NIDS (Network Intrusion Detection System).The quick response to detecting intrusion an essential feature of any NIDS system, but due to the huge amount of data obtained from organizations which impacts the performance of NIDS. The […]

CUDA

Oct, 18

Performance analysis and optimization of a CFD application

This thesis documents the analysis and optimization of a high-order finite difference computational fluid dynamics (CFD) application (PlasComCM). Performance bottlenecks were identified using performance tools and hardware counters. The performance analysis of PlasComCM showed that the quantity of memory accesses and the lack of vectorization inhibited optimal serial performance on a x86-based CPU. Optimizing techniques […]

high performance computing on graphics processing units: hgpu.org

Posts

Modern Gyrokinetic Particle-In-Cell Simulation of Fusion Plasmas on Top Supercomputers

Join Algorithms on GPUs: A Revisit After Seven Years

Sequential Code Parallelization for Multi-core Embedded Systems: A Survey of Models, Algorithms and Tools

A linguistic approach to concurrent, distributed, and adaptive programming across heterogeneous platforms

Stadium Hashing: Scalable and Flexible Hashing on GPUs

BLASX: A High Performance Level-3 BLAS Library for Heterogeneous Multi-GPU Computing

Neurokernel: An Open Source Platform for Emulating the Fruit Fly Brain

MetaFork: A Compilation Framework for Concurrency Models Targeting Hardware Accelerators and Its Application to the Generation of Parametric CUDA Kernels

Self-Adapting Parallel Framework for Long-Term Object Tracking

Implementation of a Power Efficient Synthetic Aperture Radar Back Projection Algorithm on FPGAs Using OpenCL

A Network Intrusion Detection System Framework based on Hadoop and GPGPU

Performance analysis and optimization of a CFD application

Recent source codes

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)