high performance computing on graphics processing units: hgpu.org

Posts

Feb, 18

Optimization of HEP codes on GPUs

The graphics processor units (GPUs) have evolved into high-performance co-processors that can be easily programmed with common high-level language such as C, Fortran and C++. Today’s GPUs greatly outpace CPUs in arithmetic performance and memory bandwidth, making them the ideal coprocessor to accelerate a variety of data parallel applications. Here, we shall describe the application […]

CUDA

Feb, 18

Power-aware Performance of Mixed Precision Linear Solvers for FPGAs and GPGPUs

Power has emerged as a significant constraint to high performance systems. We propose modeling power-based performance (performance/watt) and clock-based performance for GPGPUs and FPGAs. Based on the modeling, we perform a case-study with mixed precision linear solvers for a Xilinx XC5VLX330T FPGA and NVIDIA Tesla C1060 GPU. In the case-study, the FPGA shows power- and […]

Feb, 18

Accelerating Double Precision Floating-point Hessenberg Reduction on FPGA and Multicore Architectures

Double precision floating-point performance is critical for hardware acceleration technologies to be adopted by domain scientists. In this work we use the Hessenberg reduction to demonstrate the potential of FPGAs and GPUs for obtaining satisfactory double precision floating-point performance. Currently a Xeon (Nehalem) 2.26 GHz CPU can outperform Xilinx Virtex4LX200 by 3.6 folds. However, given […]

CUDA

Feb, 18

GPU Acceleration of Near-Minimal Logic Minimization

In this paper, we describe a GPU-accelerated implementation of a logic minimization heuristic based on the near minimal approach. This algorithm has three key kernel computations, and the current version of our implementation, we adapted one of these kernels for GPU execution. In this paper we report our results gained from using NVIDIA’s CUDA development […]

CUDA

Feb, 18

Fully accelerating quantum Monte Carlo simulations of real materials on GPU clusters

Continuum quantum Monte Carlo (QMC) has proved to be an invaluable tool for predicting the properties of matter from fundamental principles. By solving the manybody Schrodinger equation through a stochastic projection, it achieves greater accuracy than mean-field methods and better scalability than quantum chemical methods, enabling scientific discovery across a broad spectrum of disciplines. The […]

CUDA

Feb, 18

Sparse systems solving on GPUs with GMRES

Scientific applications very often rely on solving one or more linear systems. When matrices are sparse, iterative methods are preferred to direct ones. Nevertheless, the value of nonzero elements and their distribution (i.e., the sketch of the matrix) greatly influence the efficiency of those methods (in terms of computation time, number of iterations, result precision) […]

CUDA

Feb, 18

Accelerating Power Flow studies on Graphics Processing Unit

This paper presents the design of Power Flow algorithm that has enhanced performance on the Graphics Processing Unit (GPU) using Compute Unified Device Architecture (CUDA). This work investigates the performance of optimized CPU versions of Newton-Raphson (Polar form) and Gauss-Jacobi power flow algorithms, highlights the approach used to reduce the computation time by performing these […]

CUDA

Feb, 18

Performance Comparison of Cholesky Decomposition on GPUs and FPGAs

Cholesky decomposition has been widely utilized for positive symmetric matrix factorization in solving least square problems. Various parallel accelerators including GPUs and FPGAs have been explored to improve performance. In this paper, Cholesky decomposition is implemented on both FPGAs and GPUs by designing a dedicated architecture for FPGAs and exploiting massively parallel computation for GPUs. […]

OpenCL

Feb, 17

OpenCL Evaluation for Numerical Linear Algebra Library Development

With the help of of CUDA [7], [6], many applications improved their performance by using GPUs. In our project called Matrix Algebra on GPU and Multicore Architectures (MAGMA) [10], we mainly focus on dense linear algebra routines similar to those from LAPACK [1]. Other than CUDA, there exist other frameworks that allow platformindependent programming for […]

OpenCL

Feb, 17

Evaluating one-sided programming models for GPU cluster computations

The Global Array toolkit (GA) [1] is a powerful framework for implementing algorithms with irregular communication patterns, such as those of quantum chemistry. On the other hand, accelerators such as GPUs have shown great potential for important kernels in quantum chemistry, for example, atomic integral generation [2] and dense linear algebra in correlated methods [3]. […]

CUDA

Feb, 17

GPU Accelerated Particle System for Triangulated Surface Meshes

Shape analysis based on images and implicit surfaces has been an active area of research for the past several years. Particle systems have emerged as a viable solution to represent shapes for statistical analysis. One of the most widely used representations of shapes in computer graphics and visualization is the triangular mesh. It is desirable […]

CUDA

Feb, 17

Medium-Grained Functions Mapping using Modern GPUs

The map is a higher-order function that applies a given function to the list or lists of elements producing the list of results. The mapped function is applied to each element of the list independently, thus can be performed for all elements in parallel, making the GPU an interesting platform to be implemented on. Although […]

CUDA

high performance computing on graphics processing units: hgpu.org

Posts

Optimization of HEP codes on GPUs

Power-aware Performance of Mixed Precision Linear Solvers for FPGAs and GPGPUs

Accelerating Double Precision Floating-point Hessenberg Reduction on FPGA and Multicore Architectures

GPU Acceleration of Near-Minimal Logic Minimization

Fully accelerating quantum Monte Carlo simulations of real materials on GPU clusters

Sparse systems solving on GPUs with GMRES

Accelerating Power Flow studies on Graphics Processing Unit

Performance Comparison of Cholesky Decomposition on GPUs and FPGAs

OpenCL Evaluation for Numerical Linear Algebra Library Development

Evaluating one-sided programming models for GPU cluster computations

GPU Accelerated Particle System for Triangulated Surface Meshes

Medium-Grained Functions Mapping using Modern GPUs

Recent source codes

CudaForge: An Agent Framework with Hardware Feedback for CUDA Kernel Optimization

LC Framework

pplx-garden: Perplexity open source garden for inference technology

Atlas CLI: Machine Learning (ML) Lifecycle & Transparency Manager

transformers_tvm: Implementation of Encoder Decoder transformer on TVM

OpScanner

INT v.s. FP: A framework to compare low-bit integer and float-point formats

AutoDock-GPU: AutoDock for GPUs and other accelerators

NCCLX: collective communication framework

Tutoring LLM into a Better CUDA Optimizer

Most viewed papers (last 30 days)