high performance computing on graphics processing units: hgpu.org

Posts

May, 19

Solving the Coalition Structure Generation Problem on a GPU

We develop the first parallel algorithm for Coalition Structure Generation (CSG), which is central to many multi-agent systems applications. Our approach involves distributing the key steps of a dynamic programming approach to CSG across computational nodes on a Graphics Processing Unit (GPU) such that each of the thousands of threads of computation can be used […]

CUDA

May, 19

In-Place Recursive Approach for All-Pairs Shortest Paths Problem Using OpenCL

The all-pairs shortest paths (APSP) problem finds the shortest path distances between all pairs of vertices,and is one of the most fundamental graph problems. In this paper, a parallel recursive partitioning approach to APSP problem using Open Computing Language (OpenCL) for directed and dense graphs with no negative cyclesbased on R-Kleene algorithm, is presented, which […]

OpenCL

May, 17

Secrets from the GPU

Acceleration of cryptographic applications on massively parallel computing platforms, such as Graphics Processing Units (GPUs), becomes a real challenge as their decreasing cost and mass production makes practical implementations attractive. We propose a layered trusted architecture integrating random bits generation and parallelized RSA cryptographic computations on such platforms. The GPU-resident, three-tier, MR architecture consists of […]

OpenCL

May, 17

Fluid Motion Modelling Using Vortex Particle Method on GPU

In this paper we present the vortex-in-cell method aimed at graphic processor units. Inviscid fluid model is examined in domain with periodic boundary conditions. The leap-frogging vortex rings simulation results are presented with sample vortex rings collision visualization. At the end the GPU solver performance advantage over CPU solver is presented.

CUDA

May, 17

GPU-based Numerical Integration in the Partition of Unity Method

In this thesis, we present a CUDA-implementation of two sub-steps of the Parallel Multilevel Partition of Unity Method (PMPUM). The PMPUM is a method for the approximation of Partial Differential Equations (PDEs) whose main computational effort is caused by the integration of the weak formulation. Therefore, an efficient CUDA-implementation of the required steps could speed […]

CUDA

May, 17

Performance Evaluation of CPU-GPU communication Depending on the Characteristic of Co-Located Workloads

Todays, there are many studies in complicated computation and big data processing by using the high performance computability of GPU. Tesla K20X recently announced by NVIDIA provides 3.95 TFLOPS in precision floating point performance [1]. The performance of K20X is 10 times higher than Intel’s high-end CPUs. Due to the high performance computability of GPU, […]

CUDA

May, 17

Making the case of GPUs in courses on computational physics

Most relatively modern desktop or even laptop computers contain a graphics card useful for more than showing colors on a screen. In this paper, we make a case for why you should learn enough about GPU (graphics processing unit) computing to use as an accelerator or even replacement to your CPU code. We include an […]

CUDA

May, 17

The 2013 International Workshop on Embedded Multicore Systems, ICPP-EMS 2013

ICPP-EMS 2013 is organized in conjunction with ICPP 2013 The 42nd International Conference on Parallel Processing. The 2013 International Workshop on Embedded Multicore Systems (ICPP-EMS 2013) will bring researchers and experts together to present and discuss the latest developments and technical solutions concerning various aspects of embedded Multicore computing. ICPP-EMS 2013 seeks original unpublished papers […]

May, 17

3rd International Workshop on Embedded Multi-core Computing and Applications, EMCA 2013

In conjunction with the 15th IEEE International Conference on High Performance Computing and Communications (HPCC 2013). The goal of this workshop is to provide a forum for researchers and practitioners to discuss and share their research and development experiences and outputs on the massively parallel GPU platforms, multi-core system, optimization techniques, parallel algorithm design, applications, […]

May, 17

IS&T/SPIE Electronic Imaging 2014

Mobile Computational Photography 2014, part of program track on Mobile Imaging This conference is intended to bring together world class researchers and practitioners that develop and deploy imaging technologies to enable novel solutions for mobile photography. Submissions are accepted on theory, application, and experience. The scope of the conference includes: Cameras optical designs for ultra […]

May, 17

22nd Euromicro International Conference on Parallel, Distributed and Network-Based Processing, PDP 2014

Special Session on GPU computing The Special Session on GPU Computing and Hybrid Computing aims at providing a forum for scientific researchers and engineers on hot topics related to GPU computing and hybrid computing with special emphasis on applications, performance analysis, programming models and mechanisms for mapping codes. Topics of interest include, but are not […]

May, 16

Point Spread Function Estimation of Solar Surface Images with a Cooperative Particle Swarm Optimization on GPUs

We present a method for estimating the point spread function (PSF) of solar surface images acquired from ground telescopes and degraded by atmosphere. The estimation is done by retrieving the wavefront phase using a set of short exposures, the speckle reconstruction of the observed object and a PSF model parametrized by Zernike polynomials. Estimates of […]

OpenCL

high performance computing on graphics processing units: hgpu.org

Posts

Solving the Coalition Structure Generation Problem on a GPU

In-Place Recursive Approach for All-Pairs Shortest Paths Problem Using OpenCL

Secrets from the GPU

Fluid Motion Modelling Using Vortex Particle Method on GPU

GPU-based Numerical Integration in the Partition of Unity Method

Performance Evaluation of CPU-GPU communication Depending on the Characteristic of Co-Located Workloads

Making the case of GPUs in courses on computational physics

The 2013 International Workshop on Embedded Multicore Systems, ICPP-EMS 2013

3rd International Workshop on Embedded Multi-core Computing and Applications, EMCA 2013

IS&T/SPIE Electronic Imaging 2014

22nd Euromicro International Conference on Parallel, Distributed and Network-Based Processing, PDP 2014

Point Spread Function Estimation of Solar Surface Images with a Cooperative Particle Swarm Optimization on GPUs

Recent source codes

OpScanner

Atlas CLI: Machine Learning (ML) Lifecycle & Transparency Manager

transformers_tvm: Implementation of Encoder Decoder transformer on TVM

INT v.s. FP: A framework to compare low-bit integer and float-point formats

AutoDock-GPU: AutoDock for GPUs and other accelerators

NCCLX: collective communication framework

Tutoring LLM into a Better CUDA Optimizer

Adaptivity in AdaptiveCpp: Optimizing Performance by Leveraging Runtime Information During JIT-Compilation

Kernel Library for LLM Serving

Neptune: Advanced ML Operator Fusion for Locality and Parallelism on GPUs

Most viewed papers (last 30 days)