high performance computing on graphics processing units: hgpu.org

Posts

Jul, 3

Historic Learning Approach for Auto-tuning OpenACC Accelerated Scientific Applications

The performance optimization of scientific applications usually requires an in-depth knowledge of the hardware and software. A performance tuning mechanism is suggested to automatically tune OpenACC parameters to adapt to the execution environment on a given system. A historic learning based methodology is suggested to prune the parameter search space for a more efficient auto-tuning […]

CUDA

Jul, 3

Reducing the Code Degree Of Parallelism to Increase GPUs Reliability

A higher Degree of Parallelism decreases the code execution time. However, to manage the increased number of parallel processes a higher scheduling strain is required and caches, registers, and other resources utilization will be affected. All these parallelism management variations may have the countermeasure of increasing the GPU neutron sensitivity. The results of an extensive […]

CUDA

Jul, 3

Toward Auto-tuned Krylov Basis Computations with minimized Communication on Clusters of Accelerators

Krylov Subspace Methods (KSMs) are widely used for solving large scale linear systems and eigenproblems. However, the computing of Krylov subspace basis for KSMs suffers from its intensive blocking scalar product computation and communication, especially in large clusters with accelerators like GPUs. In this paper, a Hyper Graph based communication optimization is applied to Arnoldi […]

CUDA

Jul, 1

Mixed-precision Orthogonalization Scheme and Adaptive Step Size for CA-GMRES on GPUs

We propose a mixed-precision orthogonalization scheme that takes the input matrix in a standard 32 or 64-bit floating-point precision, but accumulates its intermediate results in the doubled-precision. For a 64-bit input matrix, we use software emulation for the higher-precision arithmetics. Compared with the standard orthogonalization scheme, we require about 8:5 more computation but a much […]

Jul, 1

Energy Efficiency Benefits of Reducing the Voltage Guardband on the Kepler GPU Architecture

Energy efficiency of GPU architectures has emerged as an important design criterion for both NVIDIA and AMD. In this paper, we explore the benefits of scaling a general-purpose GPU (GPGPU) core’s supply voltage to the near limits of execution failure. We find that as much as 21% of NVIDIA GTX 680’s core supply voltage guardband […]

CUDA

Jul, 1

Accelerated Computation of Minimum Enclosing Balls by GPU Parallelization and Distance Filtering

Minimum enclosing balls are used extensively to speed up multidimensional data processing in, e.g., machine learning, spatial databases, and computer graphics. We present a case study of several acceleration techniques that are applicable in enclosing ball algorithms based on repeated farthest-point queries. Parallel GPU solutions using CUDA are developed for both low- and high-dimensional cases. […]

CUDA

Jul, 1

Parallelizing the cellular potts model on GPU and multi-core CPU: An OpenCL cross-platform study

In this paper, we present the analysis and development of a cross-platform OpenCL parallelization of the Cellular Potts Model (CPM). In general, the evolution of the CPM is time-consuming. Using data-parallel programming model such as CUDA can accelerate the process, but it is highly dependent on the hardware type and manufacturer. Recently, OpenCL has attracted […]

OpenCL

Jul, 1

High-Level Programming Framework for Executing Streaming Applications on Heterogeneous OpenCL Platforms

As the computer industry is reaching more and more limits regarding processor speed and transistor size, they have to come up with complex new architectures and more efficient use of the available processing power. For application developers this can be a difficult task, because they have to be aware of low-level hardware properties and there […]

OpenCL

Jul, 1

4th International Conference on Information Computer Application, ICICA 2015

Submission Deadline: 2014-10-05 Publication: The ICICA 2015 conference proceeding will be published in the International Journal of Computer and Communication Engineering (ISSN:2010-3743 www.ijcce.org ), which will be indexed by Google Scholar, Engineering & Technology Digital Library,ProQuest, and Crossref Call for Paper: Algorithms Automated Software Engineering Bioinformatics and Scientific Computing Compilers and Interpreters Computer Animation Artificial […]

Jul, 1

3rd International Conference on System Modeling and Optimization, ICSMO 2015

Submission Deadline: 2014-09-20 Publication: The ICSMO 2015 conference proceeding will be published in the International Journal of Modeling and Optimization (ISSN: 2010-3697 www.ijmo.org ), and will be included in the Engineering & Technology Digital Library, and indexed by ProQuest, Google Scholar and Crossref. Call for Paper: Agent Based Simulation Analytical and Stochastic Modelling Techniques and […]

Jul, 1

6th International Conference on Computer Modeling and Simulation, ICCMS 2015

Submission Deadline: 2014-09-30 Publication: As usual, all accepted papers for the ICCMS 2015 will be published in the International Journal of Computer Theory and Engineering (ISSN:1793-8201 www.ijcte.org ), will be indexed by Electronic Journals Library, EBSCO, Engineering & Technology Digital Library, Google Scholar, INSPEC, Ulrich’s Periodicals Directory, Crossref, ProQuest, WorldCat, and EI (INSPEC, IET). Call […]

Jul, 1

Kd-tree Based N-Body Simulations with Volume-Mass Heuristic on the GPU

N-body simulations represent an important class of numerical simulations in order to study a wide range of physical phenomena for which researchers demand fast and accurate implementations. Due to the computational complexity, simple brute-force methods to solve the long-distance interaction between bodies can only be used for small-scale simulations. Smarter approaches utilize neighbor lists, tree […]

OpenCL

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Historic Learning Approach for Auto-tuning OpenACC Accelerated Scientific Applications

Reducing the Code Degree Of Parallelism to Increase GPUs Reliability

Toward Auto-tuned Krylov Basis Computations with minimized Communication on Clusters of Accelerators

Mixed-precision Orthogonalization Scheme and Adaptive Step Size for CA-GMRES on GPUs

Energy Efficiency Benefits of Reducing the Voltage Guardband on the Kepler GPU Architecture

Accelerated Computation of Minimum Enclosing Balls by GPU Parallelization and Distance Filtering

Parallelizing the cellular potts model on GPU and multi-core CPU: An OpenCL cross-platform study

High-Level Programming Framework for Executing Streaming Applications on Heterogeneous OpenCL Platforms

4th International Conference on Information Computer Application, ICICA 2015

3rd International Conference on System Modeling and Optimization, ICSMO 2015

6th International Conference on Computer Modeling and Simulation, ICCMS 2015

Kd-tree Based N-Body Simulations with Volume-Mass Heuristic on the GPU

Recent source codes

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Most viewed papers (last 30 days)