high performance computing on graphics processing units: hgpu.org

Posts

May, 10

Age and Gender Classification using Convolutional Neural Networks

Automatic age and gender classification has become relevant to an increasing amount of applications, particularly since the rise of social platforms and social media. Nevertheless, performance of existing methods on real-world images is still significantly lacking, especially when compared to the tremendous leaps in performance recently reported for the related task of face recognition. In […]

CUDA

May, 10

Numerical Simulation of Melting with Natural Convection Based on Lattice Boltzmann Method and Performed with CUDA Enabled GPU

A new solver is developed to numerically simulate the melting phase change with natural convection. This solver was implemented on a single Nvidia GPU based on the CUDA technology in order to simulate the melting phase change in a 2D rectangular enclosure. The Rayleigh number is of the order of magnitude of 108 and Prandlt […]

CUDA

May, 10

Tracking Many Solution Paths of a Polynomial Homotopy on a Graphics Processing Unit

Polynomial systems occur in many areas of science and engineering. Unlike general nonlinear systems, the algebraic structure enables to compute all solutions of a polynomial system. We describe our massive parallel predictor-corrector algorithms to track many solution paths of a polynomial homotopy. The data parallelism that provides the speedups stems from the evaluation and differentiation […]

CUDA

May, 10

GPU-accelerated micromagnetic simulations using cloud computing

Highly-parallel graphics processing units (GPUs) can improve the speed of micromagnetic simulations significantly as compared to conventional computing using central processing units (CPUs). We present a strategy for performing GPU-accelerated micromagnetic simulations by utilizing cost-effective GPU access offered by cloud computing services with an open-source Python-based program for running the MuMax3 micromagnetics code remotely. We […]

CUDA

May, 10

GPU Ray-Traced Collision Detection: Fine Pipeline Reorganization

Ray-tracing algorithms can be used to render a virtual scene and to detect collisions between objects. Numerous ray-tracing algorithms have been proposed which use data structures optimized for specific cases (rigid objects, deformable objects, etc.). Some solutions try to optimize performance by combining several algorithms to use the most efficient algorithm for each ray. This […]

OpenCL

May, 7

SparkCL: A Unified Programming Framework for Accelerators on Heterogeneous Clusters

We introduce SparkCL, an open source unified programming framework based on Java, OpenCL and the Apache Spark framework. The motivation behind this work is to bring unconventional compute cores such as FPGAs/GPUs/APUs/DSPs and future core types into mainstream programming use. The framework allows equal treatment of different computing devices under the Spark framework and introduces […]

OpenCL

May, 7

Supporting input dependent access pattern algorithms on GPUs using GPUfs

Accelerating processing of very large datasets on GPUs is challenging, in particular when algorithms exhibit unpredictable data access patterns. In this paper we investigate the utility of GPUfs, a library that provides direct access to files from GPU programs, to implement such algorithms. We analyze the system’s bottlenecks, and suggest several modification to the GPUfs […]

CUDA

May, 7

Activity recognition from videos with parallel hypergraph matching on GPUs

In this paper, we propose a method for activity recognition from videos based on sparse local features and hypergraph matching. We benefit from special properties of the temporal domain in the data to derive a sequential and fast graph matching algorithm for GPUs. Traditionally, graphs and hypergraphs are frequently used to recognize complex and often […]

OpenCL

May, 7

AccFFT: A library for distributed-memory 3-D FFT on CPU and GPU architectures

We present a new library for scalable 3-D Fast Fourier Transforms (FFT). Despite the large amount of work on 3-D FFTs, we show that significant speedups can be achieved for large problem sizes and core counts. The importance of FFT in science and engineering and the advances in high performance computing necessitate further improvements in […]

CUDA

May, 7

Fireflies: New software for interactively exploring dynamical systems using GPU computing

In non-linear systems, where explicit analytic solutions usually can’t be found, visualisation is a powerful approach which can give insights into the dynamical behaviour of models; it is also crucial for teaching this area of mathematics. In this paper we present new software, Fireflies, which exploits the power of graphical processing unit (GPU) computing to […]

OpenCL

May, 5

OMP2HMPP: Compiler Framework for Energy-Performance Trade-off Analysis of Automatically Generated Codes

We present OMP2HMPP, a tool that, in a first step, automatically translates OpenMP code into various possible transformations of HMPP. In a second step OMP2HMPP executes all variants to obtain the performance and power consumption of each transformation. The resulting trade-off can be used to choose the more convenient version. After running the tool on […]

CUDA

May, 5

Coherent Photon Mapping on the Intel MIC Architecture

Photon mapping is a global illumination algorithm which is composed of two steps: photon tracing and photon searching. During photon searching step, each shading point needs to search the photon-tree to find k-neighbouring photons for reflected radiance estimation. As the number of shading points and the size of photon-tree are dramatically large, the photon searching […]

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Age and Gender Classification using Convolutional Neural Networks

Numerical Simulation of Melting with Natural Convection Based on Lattice Boltzmann Method and Performed with CUDA Enabled GPU

Tracking Many Solution Paths of a Polynomial Homotopy on a Graphics Processing Unit

GPU-accelerated micromagnetic simulations using cloud computing

GPU Ray-Traced Collision Detection: Fine Pipeline Reorganization

SparkCL: A Unified Programming Framework for Accelerators on Heterogeneous Clusters

Supporting input dependent access pattern algorithms on GPUs using GPUfs

Activity recognition from videos with parallel hypergraph matching on GPUs

AccFFT: A library for distributed-memory 3-D FFT on CPU and GPU architectures

Fireflies: New software for interactively exploring dynamical systems using GPU computing

OMP2HMPP: Compiler Framework for Energy-Performance Trade-off Analysis of Automatically Generated Codes

Coherent Photon Mapping on the Intel MIC Architecture

Recent source codes

AutoDock-GPU: AutoDock for GPUs and other accelerators

NCCLX: collective communication framework

Tutoring LLM into a Better CUDA Optimizer

Kernel Library for LLM Serving

Adaptivity in AdaptiveCpp: Optimizing Performance by Leveraging Runtime Information During JIT-Compilation

Neptune: Advanced ML Operator Fusion for Locality and Parallelism on GPUs

Genten: Software for Generalized Tensor Decompositions by Sandia National Laboratories

Interleaved Learning and Exploration: A Self-Adaptive Fuzz Testing Framework for MLIR

Pinocchio: PINpointing Orbit Crossing Collapsed Hierarchical Objects

KernelCoder: trained on a curated dataset of reasoning traces and CUDA kernel pairs

Most viewed papers (last 30 days)