high performance computing on graphics processing units: hgpu.org

Posts

Dec, 19

Fast Random Graph Generation

Today, several database applications call for the generation of random graphs. A fundamental, versatile random graph model adopted for that purpose is the Erdos-Renyi Gamma_v,p model. This model can be used for directed, undirected, and multipartite graphs, with and without self-loops; it induces algorithms for both graph generation and sampling, hence is useful not only […]

CUDA

Dec, 19

GPU-Accelerated Preconditioned Iterative Linear Solvers

This work is an overview of our preliminary experience in developing high-performance iterative linear solver accelerated by GPU co-processors. Our goal is to illustrate the advantages and difficulties encountered when deploying GPU technology to perform sparse linear algebra computations. Techniques for speeding up sparse matrix-vector product (SpMV) kernels and finding suitable preconditioning methods are discussed. […]

CUDA

Dec, 19

3D Recursive Gaussian IIR on GPU and FPGAs: A Case Study for Accelerating Bandwidth-Bounded Applications

GPU devices typically have a higher off-chip bandwidth than FPGA-based systems. Thus typically GPU should perform better for bandwidth-bounded massive parallel applications. In this paper we present our implementations of a 3D recursive Gaussian IIR on multicore CPU, many-core GPU and multi-FPGA platforms. Our baseline implementation on the CPU features the smallest arithmetic computation (2 […]

CUDA

Dec, 19

An Efficient Simulation Environment for Modeling Large-Scale Cortical Processing

We have developed a spiking neural network simulator, which is both easy to use and computationally efficient, for the generation of large-scale computational neuroscience models. The simulator implements current or conductance based Izhikevich neuron networks, having spike-timing dependent plasticity and short-term plasticity. It uses a standard network construction interface. The simulator allows for execution on […]

CUDA

Dec, 19

Optimization of mapped functions sequences using fusions on GPU

When implementing a function mapping on the contemporary GPU, several contradictory performance factors have to be balanced. Previously a decomposition-fusion scheme was devised to guide such an implementation and this work is here further elaborated. To ease this process, an automatic source-to-source compiler is presented, while the main subject of this thesis are the core […]

CUDA

Dec, 19

GPU-based Implementation of the Variational Path Integral Method

Any system in the world constitutes particles like electrons. To analyze the behaviors of these systems the behavior of these particles must be predicted. The ground state energy of a molecule is the most important information about a molecule and can calculate by solving the Schrodinger equation. But as the number of atoms increase, the […]

CUDA

Dec, 19

Towards Automatic C Programs Optimization and Parallelization using the PIPS-PoCC Integration

This paper explains how the PIPS source-to-source compilation framework integrates the Polyhedral Compiler Collection (PoCC) as one of PIPS many program transformations. The integration between PIPS and PoCC extracts automatically the static control parts of the source code, which can be optimized independently by PoCC and then reintegrates them transparently in the user source code. […]

CUDA

Dec, 19

Experiences Developing the OpenUH Compiler and Runtime Infrastructure

The OpenUH compiler is a branch of the open source Open64 compiler suite for C, C++, Fortran 95/2003, with support for a variety of targets including x86_64, IA-64, and IA-32. For the past several years, we have used OpenUH to conduct research in parallel programming models and their implementation, static and dynamic analysis of parallel […]

Dec, 19

Acceleration of grammatical evolution using graphics processing units: computational intelligence on consumer games and graphics hardware

Several papers show that symbolic regression is suitable for data analysis and prediction in financial markets. Grammatical Evolution (GE), a grammar-based form of Genetic Programming (GP), has been successfully applied in solving various tasks including symbolic regression. However, often the computational effort to calculate the fitness of a solution in GP can limit the area […]

CUDA

Dec, 19

Parallel programming with inductive synthesis

We show that program synthesis can generate GPU algorithms as well as their optimized implementations. Using the scan kernel as a case study, we describe our evolving synthesis techniques. Relying on our synthesizer, we can parallelize a serial problem by transforming it into a scan operation, synthesize a SIMD scan algorithm, and optimize it to […]

CUDA

Dec, 18

A Translation Framework for Executing the Sequential Binary Code on CPU/GPU Based Architectures

The method of using DBT (dynamic binary translation) to execute the source ISAs binary code on target platforms has been perplexed by low overhead for many years. GPU as a many-core processor has tremendous computational power. Employing GPU as a coprocessor to parallel execute the hot spot of binary code hold a great promise of […]

CUDA

Dec, 18

Automatic code generation and tuning for stencil kernels on modern shared memory architectures

In this paper, we present Patus, a code generation and auto-tuning framework for stencil computations targeted at multi- and manycore processors, such as multicore CPUs and graphics processing units. Patus, which stands for "Parallel Autotuned Stencils," generates a compute kernel from a specification of the stencil operation and a strategy which describes the parallelization and […]

CUDA

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Fast Random Graph Generation

GPU-Accelerated Preconditioned Iterative Linear Solvers

3D Recursive Gaussian IIR on GPU and FPGAs: A Case Study for Accelerating Bandwidth-Bounded Applications

An Efficient Simulation Environment for Modeling Large-Scale Cortical Processing

Optimization of mapped functions sequences using fusions on GPU

GPU-based Implementation of the Variational Path Integral Method

Towards Automatic C Programs Optimization and Parallelization using the PIPS-PoCC Integration

Experiences Developing the OpenUH Compiler and Runtime Infrastructure

Acceleration of grammatical evolution using graphics processing units: computational intelligence on consumer games and graphics hardware

Parallel programming with inductive synthesis

A Translation Framework for Executing the Sequential Binary Code on CPU/GPU Based Architectures

Automatic code generation and tuning for stencil kernels on modern shared memory architectures

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)