high performance computing on graphics processing units: hgpu.org

Posts

Oct, 13

Power Control for GPU Clusters in processing large-scale streams

Many emerging online data analysis applications require Large-scale streams data processing. GPU cluster is becoming a significantly parallel computing scheme to handling large-scale streams data tasks. However power optimization is a challenging issue. In this paper, we present a novel power consumption control model to shift power budge among nodes in the cluster based on […]

CUDA

Oct, 13

Contributions to parallel stochastic simulation: Application of good software engineering practices to the distribution of pseudorandom streams in hybrid Monte-Carlo simulations

The race to computing power increases every day in the simulation community. A few years ago, scientists have started to harness the computing power of Graphics Processing Units (GPUs) to parallelize their simulations. As with any parallel architecture, not only the simulation model implementation has to be ported to the new parallel platform, but all […]

CUDA

•

OpenCL

Oct, 13

Simulating Active Membrane Systems Using GPUs

Software development for cellular computing is growing up yielding new applications. In this paper, we describe a simulator for the class of recognizer P systems with active membranes, which exploits the massively parallel nature of P systems computations by using GPUs (Graphics Processing Units). The newest generation of GPUs provide a massively parallel framework to […]

CUDA

Oct, 13

Characterizing the Challenges and Evaluating the Efficacy of a CUDA-to-OpenCL Translator

The proliferation of heterogeneous computing systems has led to increased interest in parallel architectures and their associated programming models. One of the most promising models for heterogeneous computing is the accelerator model, and one of the most cost-effective, high-performance accelerators currently available is the general-purpose, graphics processing unit (GPU). Two similar programming environments have been […]

CUDA

•

OpenCL

Oct, 12

High performance sequence mining using pairwise statistical significance

With the amount of sequence data deluge as a result of next generation sequencing, there comes a need to leverage the large-scale biological sequence data. Therefore, the role of high performance computational methods to mining interesting information solely from these sequence data becomes increasingly important. Almost everything in bioinformatics counts on the inter-relationship between sequences, […]

CUDA

Oct, 12

Regional Heritability Advanced Complex Trait Analysis for GPU and Traditional Parallel Architectures

MOTIVATION: Quantification of the contribution of genetic variation to phenotypic variation for complex traits becomes increasingly computationally demanding with increasing numbers of SNPs and individuals. To meet the challenges in making feasible large scale studies, we present the REACTA software. Adapted from ACTA (and, in turn, GCTA), it is tailored to exploit the parallelism present […]

CUDA

Oct, 12

Coupling a Generalized DEM and an SPH Models Under a Heterogeneous Massively Parallel Framework

The interaction of flows and solid objects is a recurring problem in several engineering disciplines. The objective of this work is to present a fully coupled model, based on the fundamental conservation laws of hydrodynamics, namely the continuity and Navier-Stokes equations, and the equation of conservation of momentum of solid bodies. The coupled numerical solution, […]

CUDA

Oct, 12

Automatic run-time mapping of polyhedral computations to heterogeneous devices with memory-size restrictions

Tools that aim to automatically map parallel computations to heterogeneous and hierarchical systems try to divide the whole computation in parts with computational loads adjusted to the capabilities of the target devices. Some parts are executed in node cores, while others are executed in accelerator devices. Each part requires one or more data-structure pieces that […]

CUDA

Oct, 12

Dandelion: a Compiler and Runtime for Heterogeneous Systems

Computer systems increasingly rely on heterogeneity to achieve greater performance, scalability and energy efficiency. Because heterogeneous systems typically comprise multiple execution contexts with different programming abstractions and runtimes, programming them remains extremely challenging. Dandelion is a system designed to address this programmability challenge for data-parallel applications. Dandelion provides a unified programming model for heterogeneous systems […]

CUDA

Oct, 10

A Parallel Intermediate Representation for Embedded Languages

This thesis presents a parallel intermediate representation for embedded languages called PIRE, and its incorporation into the Feldspar language. The original Feldspar backend translates the parallel loops of Feldspar to ordinary for loops, meaning that they are not actually parallel in the generated code. We create an alternate backend for the Feldspar project, where the […]

OpenCL

Oct, 10

CUDA-Accelerated ODETLAP: A Parallel Lossy Compression Implementation

We present an implementation of Overdetermined Laplacian Partial Differentiation Equations (ODETLAP) that uses CUDA directly. This lossy compression technique approximates a solution to an overdetermined system of equations in order to reconstruct gridded, correlated data. ODETLAP can be used to compress a dataset or to reconstruct missing data. Parallelism in CUDA provides speed performance improvements […]

CUDA

Oct, 10

GALAMOST: GPU-accelerated large-scale molecular simulation toolkit

A new molecular simulation toolkit composed of some lately developed force fields and specified models is presented to study the self-assembly, phase transition, and other properties of polymeric systems at mesoscopic scale by utilizing the computational power of GPUs. In addition, the hierarchical self-assembly of soft anisotropic particles and the problems related to polymerization can […]

CUDA

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Power Control for GPU Clusters in processing large-scale streams

Contributions to parallel stochastic simulation: Application of good software engineering practices to the distribution of pseudorandom streams in hybrid Monte-Carlo simulations

Simulating Active Membrane Systems Using GPUs

Characterizing the Challenges and Evaluating the Efficacy of a CUDA-to-OpenCL Translator

High performance sequence mining using pairwise statistical significance

Regional Heritability Advanced Complex Trait Analysis for GPU and Traditional Parallel Architectures

Coupling a Generalized DEM and an SPH Models Under a Heterogeneous Massively Parallel Framework

Automatic run-time mapping of polyhedral computations to heterogeneous devices with memory-size restrictions

Dandelion: a Compiler and Runtime for Heterogeneous Systems

A Parallel Intermediate Representation for Embedded Languages

CUDA-Accelerated ODETLAP: A Parallel Lossy Compression Implementation

GALAMOST: GPU-accelerated large-scale molecular simulation toolkit

Recent source codes

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Most viewed papers (last 30 days)