high performance computing on graphics processing units: hgpu.org

Posts

Nov, 4

4th International Conference on Nano and Materials Science (ICNMS), 2016

Dear Scholars and Researchers, Warmest Greetings from ICNMS 2016! This is 2016 4th International Conference on Nano and Materials Science (ICNMS 2016) conference committee. We are very pleased to tell you that ICNMS 2016 will be held in New York, USA during January 7-9, 2016. Publication All papers, both invited and contributed, will be reviewed […]

Nov, 3

Software Defined Radio over CUDA

Software Defined Radio (SDR) is a wireless communication system in which components of transmitters and receivers are mostly implemented by software (filters, mixers, modulators). Thanks to this approach, is possible to implement a single universal radio transceiver, capable of multi-mode and multi-standard wireless communications. These capabilities are very useful for researchers and radio amateur, who […]

CUDA

Nov, 3

On the programmability of multi-GPU computing systems

Multi-GPU systems are widely used in High Performance Computing environments to accelerate scientific computations. This trend is expected to continue as integrated GPUs will be introduced to processors used in multi-socket servers and servers will pack a higher number of GPUs per node. GPUs are currently connected to the system through the PCI Express interconnect, […]

CUDA

Nov, 3

Exploring Optimisations for the Local Assembly phase of Finite Element Methods on GPUs

Finite Element Methods (FEM) are ubiquitous in science and engineering where they are used in fields as diverse as structural analysis, ocean modeling and bioengineering. FEM allow us to find approximate solutions to a system of partial differential equations over an unstructured mesh. The first phase of solving a FEM problem, local assembly, involves computing […]

CUDA

•

OpenCL

Nov, 3

A Framework for Transparent Execution of Massively-Parallel Applications on CUDA and OpenCL

We present a novel framework for the simultaneous development for different massively parallel platforms. Currently, our framework supports CUDA and OpenCL but it can be easily adapted to other programming languages. The main idea is to provide an easy-to-use abstraction layer that encapsulates the calls of own parallel device code as well as library functions. […]

CUDA

•

OpenCL

Nov, 3

Structural Agnostic SpMV: Adapting CSR-Adaptive for Irregular Matrices

Sparse matrix vector multiplication (SpMV) is an important linear algebra primitive. Recent research has focused on improving the performance of SpMV on GPUs when using compressed sparse row (CSR), the most frequently used matrix storage format on CPUs. Efficient CSR-based SpMV obviates the need for other GPU-specific storage formats, thereby saving runtime and storage overheads. […]

OpenCL

Oct, 31

Energy-Efficient Execution of Data-Parallel Applications on Heterogeneous Mobile Platforms

State-of-the-art mobile system-on-chips (SoC) include heterogeneity in various forms for accelerated and energy-efficient execution of diverse range of applications. The modern SoCs now include programmable cores such as CPU and GPU with very different functionality. The SoCs also integrate performance heterogeneous cores with different power-performance characteristics but the same instruction-set architecture such as ARM big.LITTLE. […]

OpenCL

Oct, 31

Estimation of numerical reproducibility on CPU and GPU

Differences in simulation results may be observed from one architecture to another or even inside the same architecture. Such reproducibility failures are often due to different rounding errors generated by different orders in the sequence of arithmetic operations. Reproducibility problems are particularly noticeable on new computing architectures such as multicore processors or GPUs (Graphics Processing […]

CUDA

•

OpenCL

Oct, 31

Parallelization of Encryption and Hashing Algorithm Using GPU

With the development of the GPGPU (General-purpose computing on graphics processing units), more and more computing problems are solved by using the parallel property of GPU (Graphics Processing Unit). CUDA (Compute Unified Device Architecture) is a framework which makes the GPGPU more accessible and easier to learn for the general population of programmers. This is […]

CUDA

Oct, 31

Investigation of General-Purpose Computing on Graphics Processing Units and its Application to the Finite Element Analysis of Electromagnetic Problems

In this dissertation, the hardware and API architectures of GPUs are investigated, and the corresponding acceleration techniques are applied on the traditional frequency domain finite element method (FEM), the element-level time-domain methods, and the nonlinear discontinuous Galerkin method. First, the assembly and the solution phases of the FEM are parallelized and mapped onto the granular […]

CUDA

Oct, 31

A general tridiagonal solver for coprocessors: Adapting g-Spike for the Intel Xeon Phi

Manycores like the Intel Xeon Phi and graphics processing units like the NVIDIA Tesla series are prime examples of systems for accelerating applications that run on current CPU multicores. It is therefore of interest to build fast, reliable linear system solvers targeting these architectures. Moreover, it is of interest to conduct cross comparisons between algorithmic […]

Oct, 31

Asynchronous Parallel Computing Algorithm implemented in 1D Heat Equation with CUDA

In this note, we present the stability as well as performance analysis of asynchronous parallel computing algorithm implemented in 1D heat equation with CUDA. The primary objective of this note lies in dissemination of asynchronous parallel computing algorithm by providing CUDA code for fast and easy implementation. We show that the simulations carried out on […]

CUDA

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

chemtrain-deploy: A parallel and scalable framework for machine learning potentials in million-atom MD simulations

microSYCL: SYCL micro-benchmarks repository

Exploring SYCL as a Portability Layer for High-Performance Computing on CPUs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Posts

4th International Conference on Nano and Materials Science (ICNMS), 2016

Software Defined Radio over CUDA

On the programmability of multi-GPU computing systems

Exploring Optimisations for the Local Assembly phase of Finite Element Methods on GPUs

A Framework for Transparent Execution of Massively-Parallel Applications on CUDA and OpenCL

Structural Agnostic SpMV: Adapting CSR-Adaptive for Irregular Matrices

Energy-Efficient Execution of Data-Parallel Applications on Heterogeneous Mobile Platforms

Estimation of numerical reproducibility on CPU and GPU

Parallelization of Encryption and Hashing Algorithm Using GPU

Investigation of General-Purpose Computing on Graphics Processing Units and its Application to the Finite Element Analysis of Electromagnetic Problems

A general tridiagonal solver for coprocessors: Adapting g-Spike for the Intel Xeon Phi

Asynchronous Parallel Computing Algorithm implemented in 1D Heat Equation with CUDA

Recent source codes

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

Most viewed papers (last 30 days)