high performance computing on graphics processing units: hgpu.org

Posts

May, 17

Investigating the Impact of Data Parallelism and GPU Technology on Computer Gaming

According to the current design trends, multithreaded multicore processors will be ubiquitous in every device. In computer gaming, chip-makers are adding more cores to fulfill the next generation performance requirements. A game engine has many ‘tasks’ and data parallelism is an important technique for concurrent execution of these tasks. However, effective implementation of multithreaded computer […]

CUDA

May, 17

Fine-Grained Parallel Incomplete LU Factorization

This paper presents a new fine-grained parallel algorithm for computing an incomplete LU factorization. All nonzeros in the incomplete factors can be computed in parallel and asynchronously, using one or more sweeps that iteratively improve the accuracy of the factorization. Unlike existing parallel algorithms, the new algorithm does not depend on reordering the matrix. Numerical […]

May, 17

Hierarchical Transparent Programming for Heterogeneous Computing

Parallel computing and the development of parallel programs is a way to reduce the time of the program execution. During many years, sequential optimization was designed without thinking about parallel tasks. Currently, multi-core devices have arrived, making code parallelization more important. The parallel computing is closely related with both hardware and software point of view, […]

CUDA

•

OpenCL

May, 17

Heterogeneity-aware Fault Tolerance using a Self-Organizing Runtime System

Due to the diversity and implicit redundancy in terms of processing units and compute kernels, off-the-shelf heterogeneous systems offer the opportunity to detect and tolerate faults during task execution in hardware as well as in software. To automatically leverage this diversity, we introduce an extension of an online-learning runtime system that combines the benefits of […]

CUDA

May, 17

Using NVIDIA GPUs for Real-time Data Processing in a Holographic Radar System, webinar

In this webinar, Peter Wurmsdobler, Lead Software Architect, Aveillant, will give a short introduction to Aveillant’s Holographic Radar systems, the principles of Holographic radars, as opposed to scanning radar systems, as well as its computational requirements. Peter will go on to explore the technical challenges faced in the implementation of the mathematical algorithms needed, how […]

May, 16

The Next Steps for Folding@home, webinar

Folding@home is a large-scale volunteer distributed computing project, started in October 1, 2000. For over a decade, new types of hardware (such as GPUs, multi-core CPUs, and PS3) and algorithms have been pioneered in order to make significant advances in our ability to simulate diseases at the molecular scale. Join Professor Vijay Pande from Stanford […]

May, 16

An Introduction to CUDA Programming, webinar

Join Chris Mason, Product Manager, Acceleware, for an informative introduction to CUDA programming. The webinar will begin with a brief overview of CUDA and data-parallelism before focusing on the GPU programming model. Chris will explore the fundamentals of GPU kernels, host and device responsibilities, CUDA syntax and thread hierarchy. A programming demonstration of a simple […]

May, 16

C++ on GPUs Using OpenACC and the PGI Accelerator Compilers, webinar

The fastest supercomputers and clusters use a 64-bit host processor with one or more accelerators per node, most commonly GPUs. These compute accelerators exploit a high degree of parallelism to maximize performance and power efficiency. There are several challenges to effective and productive use of accelerators, the most important of which are managing data movement […]

May, 16

Using GPUs to Accelerate Orthorectification, Atmospheric Correction, and Transformations for Big Data, webinar

Significant improvements in speeds for imagery orthorectification, atmospheric correction, and image transformations like Independent Components Analysis (ICA) have been achieved using GPU-based implementations. Additional optimizations, when factored in with GPU processing capabilities, can provide 50x – 100x reduction in the time required to process large imagery. Exelis Visual Information Solutions (VIS) has implemented a CUDA-based […]

May, 16

Scaling Coupled Climate Models to Exascale: OpenACC-enabled ECEarth3 Earth System Model

Climate change due to increasing anthropogenic greenhouse gases and land surface change is currently one of the most relevant environmental concerns. It threatens ecosystems and human societies. However, its impact on the economy and our living standards depends largely on our ability to anticipate its effects and take appropriate action. Earth System Models (ESMs), such […]

CUDA

May, 16

Porting NAHUJ to CUDA

This white-paper reports on an enabling effort that involves porting a legacy 2D fluid dynamics Fortran code to NVIDIA GPUs. Given the complexity of both code and underlying (custom) numerical method, the natural choice was to use NVIDIA CUDA C to achieve the best possible performance. We achieved over 4.5x speed-up on a single K20 […]

CUDA

May, 16

Enabling CP2K Application for Exascale Computing with Accelerators using OpenACC and OpenCL

CP2K is an application for atomistic and molecular simulation and, with its excellent scalability, is particularly important with regards to use on future exascale systems. The code is well parallelized using MPI and hybrid MPI/OpenMP, typically scaling well to ~1 core per atom in the system. The research on CP2K done within PRACE-1IP stated that […]

CUDA

•

OpenCL

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Investigating the Impact of Data Parallelism and GPU Technology on Computer Gaming

Fine-Grained Parallel Incomplete LU Factorization

Hierarchical Transparent Programming for Heterogeneous Computing

Heterogeneity-aware Fault Tolerance using a Self-Organizing Runtime System

Using NVIDIA GPUs for Real-time Data Processing in a Holographic Radar System, webinar

The Next Steps for Folding@home, webinar

An Introduction to CUDA Programming, webinar

C++ on GPUs Using OpenACC and the PGI Accelerator Compilers, webinar

Using GPUs to Accelerate Orthorectification, Atmospheric Correction, and Transformations for Big Data, webinar

Scaling Coupled Climate Models to Exascale: OpenACC-enabled ECEarth3 Earth System Model

Porting NAHUJ to CUDA

Enabling CP2K Application for Exascale Computing with Accelerators using OpenACC and OpenCL

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)