high performance computing on graphics processing units: hgpu.org

Posts

Aug, 18

High Level High Performance Computing for Multitask Learning of Time-varying Models

We propose an approach suitable to learn multiple time-varying models jointly and discuss an application in data-driven weather forecasting. The methodology relies on spectral regularization and encodes the typical multi-task learning assumption that models lie near a common low dimensional subspace. The arising optimization problem amounts to estimating a matrix from noisy linear measurements within […]

CUDA

Aug, 15

Improving Communication Performance and Scalability of Native Applications on Intel Xeon Phi Coprocessor Clusters

Intel Xeon Phi coprocessor-based clusters offer high compute and memory performance for parallel workloads and also support direct network access. Many real world applications are significantly impacted by network characteristics and to maximize the performance of such applications on these clusters, it is particularly important to effectively saturate network bandwidth and/or hide communications latency. We […]

Aug, 15

Design and Evaluation of Scalable Concurrent Queues for Many-Core Architectures

As core counts increase and as heterogeneity becomes more common in parallel computing, we face the prospect of programming hundreds or even thousands of concurrent threads in a single shared-memory system. At these scales, even highly-efficient concurrent algorithms and data structures can become bottlenecks, unless they are designed from the ground up with throughput as […]

OpenCL

Aug, 15

Energy Evaluation for Applications with Different Thread Affinities on the Intel Xeon Phi

The Intel Xeon Phi coprocessor offers high parallelism on energy-efficient hardware to minimize energy consumption while maintaining performance. Dynamic frequency and voltage scaling is not accessible on the Intel Xeon Phi. Hence, saving energy relies mainly on tuning application performance. One general optimization technique is thread affinity, which is an important factor in multi-core architectures. […]

Aug, 15

GPU Accelerated Computation and Real-time Rendering of Cellular Automata Model for Spatial Simulation

Because Cellular Automata (CA) is a dynamic system with inherent parallelism, many studies are focused on mapping CA to the parallel system in order to obtain high performance computing capability, such as using clusters, supercomputers and networks of computers. But the application of these systems are too expensive and difficult to use on the occasions […]

CUDA

•

OpenGL

Aug, 15

Effect of GPU Communication-Hiding for SpMV Using OpenACC

In the finite element method simulation we often deal with large sparse matrices. Sparse matrix-vector multiplication (SpMV) is of high importance for iterative solvers. During the solver stage, most of the time is in fact spent in the SpMV routine. The SpMV routine is highly memory-bound; the processor spends much time waiting for the needed […]

Aug, 13

Numerical Computations with GPUs

This book brings together research on numerical methods adapted for Graphics Processing Units (GPUs). It explains recent efforts to adapt classic numerical methods, including solution of linear equations and FFT, for massively parallel GPU architectures. This volume consolidates recent research and adaptations, covering widely used methods that are at the core of many scientific and […]

CUDA

•

OpenCL

Aug, 13

Graphics Processing Unit Bloom Filters: Classical and Probabilistic

Graphics Processing Units (GPUs) have been used to enhance the speed and efficiency of both data structures and algorithms alike. A common data structure used in Computer Science is the Bloom Filter, which is used in many types of applications including databases and security logging. The Bloom Filter is a lossy data structure that uses […]

CUDA

Aug, 13

Non-Local Total Generalized Variation for Optical Flow Estimation

In this paper we introduce a novel higher-order regularization term. The proposed regularizer is a non-local extension of the popular second-order Total Generalized variation, which favors piecewise affine solutions and allows to incorporate soft-segmentation cues into the regularization term. These properties make this regularizer especially appealing for optical flow estimation, where it offers accurately localized […]

CUDA

Aug, 13

GPU-SPARC: Accelerating Parallelism in Multi-GPU Real-Time Systems

GPU (General-Purpose computation on Graphics Processing Units) offers an effective computing platform to accelerate a wide class of data-parallel computing. Multi-GPU’s appear as an attractive platform to speed up the computation of data-parallel GPU. This paper aims to explore the feasibility of relaxing the task-level restriction of single GPU use in multi-GPU real-time systems.We develop […]

OpenCL

Aug, 13

Improved GPU Co-processor Sorting Algorithm with Barrier Synchronization

Being sort is most frequent operation in science of computation, till date many sorting algorithms are proposed for CPUs & GPUs .Generally GPUs suffers with low memory sizes, due to this it is not possible to accommodate large data in GPU global Memory which arises external sorting techniques. These GPU based external sorting algorithms are […]

CUDA

Aug, 13

The 2nd International Conference on Advances in Electronics Engineering, ICAEE 2015

Submission Deadline: 2014-11-10 Publication: All accepted papers of ICAEE 2015 will be published by International Journal of Information and Electronics Engineering (IJIEE) which will be indexed by Google Scholar, Electronic Journals Library,Engineering & Technology Digital Library,Crossref and ProQuest, DOAJ, Ei (INSPEC, IET). Topics: Electronics and Communications Engineering QoS Provisioning and Architectures Telecommunication Services and Applications […]

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

chemtrain-deploy: A parallel and scalable framework for machine learning potentials in million-atom MD simulations

microSYCL: SYCL micro-benchmarks repository

Exploring SYCL as a Portability Layer for High-Performance Computing on CPUs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Posts

High Level High Performance Computing for Multitask Learning of Time-varying Models

Improving Communication Performance and Scalability of Native Applications on Intel Xeon Phi Coprocessor Clusters

Design and Evaluation of Scalable Concurrent Queues for Many-Core Architectures

Energy Evaluation for Applications with Different Thread Affinities on the Intel Xeon Phi

GPU Accelerated Computation and Real-time Rendering of Cellular Automata Model for Spatial Simulation

Effect of GPU Communication-Hiding for SpMV Using OpenACC

Numerical Computations with GPUs

Graphics Processing Unit Bloom Filters: Classical and Probabilistic

Non-Local Total Generalized Variation for Optical Flow Estimation

GPU-SPARC: Accelerating Parallelism in Multi-GPU Real-Time Systems

Improved GPU Co-processor Sorting Algorithm with Barrier Synchronization

The 2nd International Conference on Advances in Electronics Engineering, ICAEE 2015

Recent source codes

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

Most viewed papers (last 30 days)