high performance computing on graphics processing units: hgpu.org

Posts

May, 19

GPU Programming for Physics Applications

The development of increasingly powerful and low cost massively parallel processors, known as GPUs, has created new opportunities for high speed and high precision computational work in physics. GPUs are extremely well suited to solving computationally intense problems at speeds much greater than traditional processors. They are now found in most personal computers, with research […]

CUDA

May, 19

Solving the Coalition Structure Generation Problem on a GPU

We develop the first parallel algorithm for Coalition Structure Generation (CSG), which is central to many multi-agent systems applications. Our approach involves distributing the key steps of a dynamic programming approach to CSG across computational nodes on a Graphics Processing Unit (GPU) such that each of the thousands of threads of computation can be used […]

CUDA

May, 19

In-Place Recursive Approach for All-Pairs Shortest Paths Problem Using OpenCL

The all-pairs shortest paths (APSP) problem finds the shortest path distances between all pairs of vertices,and is one of the most fundamental graph problems. In this paper, a parallel recursive partitioning approach to APSP problem using Open Computing Language (OpenCL) for directed and dense graphs with no negative cyclesbased on R-Kleene algorithm, is presented, which […]

OpenCL

May, 17

Secrets from the GPU

Acceleration of cryptographic applications on massively parallel computing platforms, such as Graphics Processing Units (GPUs), becomes a real challenge as their decreasing cost and mass production makes practical implementations attractive. We propose a layered trusted architecture integrating random bits generation and parallelized RSA cryptographic computations on such platforms. The GPU-resident, three-tier, MR architecture consists of […]

OpenCL

May, 17

Fluid Motion Modelling Using Vortex Particle Method on GPU

In this paper we present the vortex-in-cell method aimed at graphic processor units. Inviscid fluid model is examined in domain with periodic boundary conditions. The leap-frogging vortex rings simulation results are presented with sample vortex rings collision visualization. At the end the GPU solver performance advantage over CPU solver is presented.

CUDA

May, 17

GPU-based Numerical Integration in the Partition of Unity Method

In this thesis, we present a CUDA-implementation of two sub-steps of the Parallel Multilevel Partition of Unity Method (PMPUM). The PMPUM is a method for the approximation of Partial Differential Equations (PDEs) whose main computational effort is caused by the integration of the weak formulation. Therefore, an efficient CUDA-implementation of the required steps could speed […]

CUDA

May, 17

Performance Evaluation of CPU-GPU communication Depending on the Characteristic of Co-Located Workloads

Todays, there are many studies in complicated computation and big data processing by using the high performance computability of GPU. Tesla K20X recently announced by NVIDIA provides 3.95 TFLOPS in precision floating point performance [1]. The performance of K20X is 10 times higher than Intel’s high-end CPUs. Due to the high performance computability of GPU, […]

CUDA

May, 17

Making the case of GPUs in courses on computational physics

Most relatively modern desktop or even laptop computers contain a graphics card useful for more than showing colors on a screen. In this paper, we make a case for why you should learn enough about GPU (graphics processing unit) computing to use as an accelerator or even replacement to your CPU code. We include an […]

CUDA

May, 17

The 2013 International Workshop on Embedded Multicore Systems, ICPP-EMS 2013

ICPP-EMS 2013 is organized in conjunction with ICPP 2013 The 42nd International Conference on Parallel Processing. The 2013 International Workshop on Embedded Multicore Systems (ICPP-EMS 2013) will bring researchers and experts together to present and discuss the latest developments and technical solutions concerning various aspects of embedded Multicore computing. ICPP-EMS 2013 seeks original unpublished papers […]

May, 17

3rd International Workshop on Embedded Multi-core Computing and Applications, EMCA 2013

In conjunction with the 15th IEEE International Conference on High Performance Computing and Communications (HPCC 2013). The goal of this workshop is to provide a forum for researchers and practitioners to discuss and share their research and development experiences and outputs on the massively parallel GPU platforms, multi-core system, optimization techniques, parallel algorithm design, applications, […]

May, 17

IS&T/SPIE Electronic Imaging 2014

Mobile Computational Photography 2014, part of program track on Mobile Imaging This conference is intended to bring together world class researchers and practitioners that develop and deploy imaging technologies to enable novel solutions for mobile photography. Submissions are accepted on theory, application, and experience. The scope of the conference includes: Cameras optical designs for ultra […]

May, 17

22nd Euromicro International Conference on Parallel, Distributed and Network-Based Processing, PDP 2014

Special Session on GPU computing The Special Session on GPU Computing and Hybrid Computing aims at providing a forum for scientific researchers and engineers on hot topics related to GPU computing and hybrid computing with special emphasis on applications, performance analysis, programming models and mechanisms for mapping codes. Topics of interest include, but are not […]

high performance computing on graphics processing units: hgpu.org

Posts

GPU Programming for Physics Applications

Solving the Coalition Structure Generation Problem on a GPU

In-Place Recursive Approach for All-Pairs Shortest Paths Problem Using OpenCL

Secrets from the GPU

Fluid Motion Modelling Using Vortex Particle Method on GPU

GPU-based Numerical Integration in the Partition of Unity Method

Performance Evaluation of CPU-GPU communication Depending on the Characteristic of Co-Located Workloads

Making the case of GPUs in courses on computational physics

The 2013 International Workshop on Embedded Multicore Systems, ICPP-EMS 2013

3rd International Workshop on Embedded Multi-core Computing and Applications, EMCA 2013

IS&T/SPIE Electronic Imaging 2014

22nd Euromicro International Conference on Parallel, Distributed and Network-Based Processing, PDP 2014

Recent source codes

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)