high performance computing on graphics processing units: hgpu.org

Posts

Oct, 22

Neurokernel: An Open Source Platform for Emulating the Fruit Fly Brain

We have developed an open software platform called Neurokernel for collaborative development of comprehensive models of the brain of the fruit fly Drosophila melanogaster and their execution and testing on multiple Graphics Processing Units (GPUs). Neurokernel provides a programming model that capitalizes upon the structural organization of the fly brain into a fixed number of […]

CUDA

Oct, 18

Self-Adapting Parallel Framework for Long-Term Object Tracking

Object tracking is a crucial field in computer vision that has many uses in human-computer interaction, security and surveillance, video communication and compression, augmented reality, traffic control, etc. Many implementations are introduced in practice, and yet recent methods emphasize on tracking objects adaptively by learning the object’s perspectives and rediscovering it when it becomes untraceable, […]

OpenCL

Oct, 18

Implementation of a Power Efficient Synthetic Aperture Radar Back Projection Algorithm on FPGAs Using OpenCL

In this thesis, an implementation of a Synthetic Aperture Radar (SAR) back projection algorithm onto a Field-Programmable Gate Array (FPGA) device using Open Computing Language (OpenCL) is developed. SAR back projection is a method to form a high-resolution terrain image from radar data. SAR is used in many applications such as Geographic Information Systems (GIS), […]

OpenCL

Oct, 18

A Network Intrusion Detection System Framework based on Hadoop and GPGPU

In IT industry the business data grows exponentially, which results in concern to enhance the security system by implementing effective NIDS (Network Intrusion Detection System).The quick response to detecting intrusion an essential feature of any NIDS system, but due to the huge amount of data obtained from organizations which impacts the performance of NIDS. The […]

CUDA

Oct, 18

Performance analysis and optimization of a CFD application

This thesis documents the analysis and optimization of a high-order finite difference computational fluid dynamics (CFD) application (PlasComCM). Performance bottlenecks were identified using performance tools and hardware counters. The performance analysis of PlasComCM showed that the quantity of memory accesses and the lack of vectorization inhibited optimal serial performance on a x86-based CPU. Optimizing techniques […]

Oct, 18

MetaFork: A Compilation Framework for Concurrency Models Targeting Hardware Accelerators and Its Application to the Generation of Parametric CUDA Kernels

In this paper, we present the accelerator model of MetaFork together with the software framework that allows automatic generation of CUDA code from annotated MetaFork programs. One of the key features of this CUDA code generator is that it supports the generation of CUDA kernel code where program parameters (like number of threads per block) […]

CUDA

Oct, 16

Density-based parallel skin lesion border detection with webCL

BACKGROUND: Dermoscopy is a highly effective and noninvasive imaging technique used in diagnosis of melanoma and other pigmented skin lesions. Many aspects of the lesion under consideration are defined in relation to the lesion border. This makes border detection one of the most important steps in dermoscopic image analysis. In current practice, dermatologists often delineate […]

OpenCL

Oct, 16

Comparison of Thread Execution Methods for GPU-oriented OpenCL Programs on Multicore Processors

With the broad deployment of multicore processors, there are increasing demands to port OpenCL programs written for GPUs onto the multicore processors. However, OpenCL programs written for GPUs cannot run efficiently on multicore processors since GPU-oriented OpenCL programs generally consist of a huge number of threads. This paper presents experimental comparisons of three thread execution […]

OpenCL

Oct, 16

Sapporo2: A versatile direct N-body library

Astrophysical direct $N$-body methods have been one of the first production algorithms to be implemented using NVIDIA’s CUDA architecture. Now, almost seven years later, the GPU is the most used accelerator device in astronomy for simulating stellar systems. In this paper we present the implementation of the Sapporo2 $N$-body library, which allows researchers to use […]

CUDA

•

OpenCL

Oct, 16

Multi-dimensional Functional Principal Component Analysis

Functional principal component analysis is one the most commonly employed approaches in functional/longitudinal data analysis and we extend it to conduct $d$-dimensional functional/longitudinal data analysis. The computational issues emerging in the extension are fully addressed with our proposed solutions. The local linear smoothing technique is employed to perform estimation because of its capabilities of performing […]

Oct, 16

A progressive mesh method for physical simulations using lattice Boltzmann method on single-node multi-gpu architectures

In this paper, a new progressive mesh algorithm is introduced in order to perform fast physical simulations by the use of a lattice Boltzmann method (LBM) on a single-node multi-GPU architecture. This algorithm is able to mesh automatically the simulation domain according to the propagation of fluids. This method can also be useful in order […]

CUDA

Oct, 13

Accelerating Applications with Pattern-specific Optimizations on Accelerators and Coprocessors

Because of the bottleneck in the increase of clock frequency, multi-cores emerged as a way of improving the overall performance of CPUs. In the recent decade, many-cores begin to play a more and more important role in scientific computing. The highly cost-effective nature of many-cores makes them extremely suitable for data-intensive computations. Specifically, many-cores are […]

CUDA

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Neurokernel: An Open Source Platform for Emulating the Fruit Fly Brain

Self-Adapting Parallel Framework for Long-Term Object Tracking

Implementation of a Power Efficient Synthetic Aperture Radar Back Projection Algorithm on FPGAs Using OpenCL

A Network Intrusion Detection System Framework based on Hadoop and GPGPU

Performance analysis and optimization of a CFD application

MetaFork: A Compilation Framework for Concurrency Models Targeting Hardware Accelerators and Its Application to the Generation of Parametric CUDA Kernels

Density-based parallel skin lesion border detection with webCL

Comparison of Thread Execution Methods for GPU-oriented OpenCL Programs on Multicore Processors

Sapporo2: A versatile direct N-body library

Multi-dimensional Functional Principal Component Analysis

A progressive mesh method for physical simulations using lattice Boltzmann method on single-node multi-gpu architectures

Accelerating Applications with Pattern-specific Optimizations on Accelerators and Coprocessors

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)