high performance computing on graphics processing units: hgpu.org

Posts

Mar, 6

Code Optimization and Scaling of the Astrophysics Software Gadget on Intel Xeon Phi

The whitepaper reports our investigation into the porting, optimization and subsequent performance of the astrophysics software package GADGET, on the Intel Xeon Phi. The GADGET code is intended for cosmological N-body/SPH simulations to solve a wide range of astrophysical tasks. The test cases within the project were simulations of galaxy systems. A performance analysis of […]

Mar, 6

Performance Tradeoff Spectrum of Integer and Floating Point Applications Kernels on Various GPUs

Floating point precision and performance and the ratio of floating point units to integer processing elements on a graphics processing unit accelerator all continue to present complex tradeoffs for optimising core utilisation on modern devices. We investigate various hybrid CPU and GPU combinations using a range of different GPU models occupying different points in this […]

CUDA

Mar, 6

Performance Analysis for GPU-based Ray-triangle Algorithms

Several algorithms have been proposed during the past years to solve the ray-triangle intersection test. In this paper we collect the most prominent solutions and describe how to parallelize them on modern programmable graphics processing units (GPUs) by means of NVIDIA CUDA. This paper also provides a comprehensive performance analysis based on several optional features […]

CUDA

Mar, 6

Efficient and Scalable Parallel Zonal Statistics on Large-Scale Species Occurrence Data on GPUs

Analyzing how species are distributed on the Earth has been one of the fundamental questions in the intersections of environmental sciences, geosciences and biological sciences. With world-wide data contributions, more than 375 million species occurrence records for nearly 1.5 million species have been deposited to the Global Biodiversity Information Facility (GBIF) data portal. The sheer […]

CUDA

Mar, 4

2014 3rd International Conference on Knowledge and Education Technology, ICKET 2014

2014-05-01 All ICKET 2014 papers will be published in International Journal of Information and Education Technology (ISSN: 2010-3689), and all papers will be indexed by Engineering & Technology Digital Library, Google Scholar, Crossref and ProQuest. Information Technology and Applications Augmented and Virtual Reality Computer Human Interaction Cyber Security;Data Structure and Algorithm Distributed and Parallel Computing […]

Mar, 4

QuickProbs – A Fast Multiple Sequence Alignment Algorithm Designed for Graphics Processors

Multiple sequence alignment is a crucial task in a number of biological analyses like secondary structure prediction, domain searching, phylogeny, etc. MSAProbs is currently the most accurate alignment algorithm, but its effectiveness is obtained at the expense of computational time. In the paper we present QuickProbs, the variant of MSAProbs customised for graphics processors. We […]

OpenCL

Mar, 4

On-Demand Source Code Generation & Scheduling Optimised Parallel Applications on Heterogeneous Platforms

Scheduling applications tasks across heterogeneous clusters is a growing problem, particularly when new upgraded components are added to a parallel computing system that may have originally been homogeneous. We describe how automatic and just-in-time source code generation techniques can be used to make the best parallel decomposition for whatever resource is available in a heterogeneous […]

CUDA

Mar, 4

Computational Experiments in Markov Chain Monte Carlo

In this thesis, I investigate computational questions in Markov chain Monte Carlo (MCMC). I am investigating one new MCMC method called the stretch move ensemble sampler [3]. I have looked at the performance of this algorithm, in terms of acceptance rates, autocorrelation time and compute performance. The thesis describes a parallel implementation of the algorithm […]

OpenCL

Mar, 4

Increasing programmability of an embedded domain specific language for GPGPU kernels using static analysis

GPGPU (general purpose computing on graphics processing units) programming is one interesting way to increase performance; unfortunately it is not easily done, because extensive knowledge of the GPU’s architecture is required to write programs that are faster than CPU programs. Obsidian is an embedded domain specific language for writing GPGPU kernels, which tries to make […]

CUDA

Mar, 4

Performance Optimization of Clustering On GPU

In today’s digital world, Data sets are increasing exponentially. Statistical analysis using clustering in various scientific and engineering applications become very challenging issue for such large data set. Clustering on huge data set and its performance are two major factors demand for optimization. Parallelization is well-known approach to optimize performance. It has been observed from […]

CUDA

Mar, 3

2014 6th International Conference on Future Networks, ICFN 2014

All accepted papers for the ICFN 2014 will be published in International Journal of Future Computer and Communication (ISSN:2010-3751, DOI: 10.7763/IJFCC), which will be indexed by DOAJ, Electronic Journals Library, Engineering & Technology Digital Library, Crossref, and Google Scholar 2014-04-10 Computer Networking Complex network Mobile and Wireless Technologies (UWB, MIMO, WiMAX, etc.) and Networks Radio […]

Mar, 3

2014 3rd International Conference on Engineering Mathematics and Physics, ICEMP 2014

All accepted papers for the ICEMP 2014 will be published in International Journal of Applied Physics and Mathematics(ISSN:2010-362X, DOI: 10.7763/IJAPM), which will be indexed by DOAJ, Electronic Journals Library, Engineering & Technology Digital Library, Nanowerk Database, Crossref, Google Scholar and ProQuest. 2014-04-10 Advanced Numerical Algorithms Algorithmic Approaches to Computational Kernels and Applications Application of Soft […]

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

chemtrain-deploy: A parallel and scalable framework for machine learning potentials in million-atom MD simulations

microSYCL: SYCL micro-benchmarks repository

Exploring SYCL as a Portability Layer for High-Performance Computing on CPUs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Code Optimization and Scaling of the Astrophysics Software Gadget on Intel Xeon Phi

Performance Tradeoff Spectrum of Integer and Floating Point Applications Kernels on Various GPUs

Performance Analysis for GPU-based Ray-triangle Algorithms

Efficient and Scalable Parallel Zonal Statistics on Large-Scale Species Occurrence Data on GPUs

2014 3rd International Conference on Knowledge and Education Technology, ICKET 2014

QuickProbs – A Fast Multiple Sequence Alignment Algorithm Designed for Graphics Processors

On-Demand Source Code Generation & Scheduling Optimised Parallel Applications on Heterogeneous Platforms

Computational Experiments in Markov Chain Monte Carlo

Increasing programmability of an embedded domain specific language for GPGPU kernels using static analysis

Performance Optimization of Clustering On GPU

2014 6th International Conference on Future Networks, ICFN 2014

2014 3rd International Conference on Engineering Mathematics and Physics, ICEMP 2014

Recent source codes

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

Most viewed papers (last 30 days)