high performance computing on graphics processing units: hgpu.org

Posts

Jan, 11

Constraint-based LN-curves

We consider the design of parametric curves from geometric constraints such as distance from lines or points and tangency to lines or circles. We solve the Hermite problem with such additional geometric constraints. We use a family of curves with linearly varying normals, LN curves, over the parameter interval [0, u]. The nonlinear equations that […]

Jan, 11

Optimization of tele-immersion codes

As computational power increases, tele-immersive applications are an emerging trend. These applications make extensive demands on computational resources through their heavy use of real-time 3D reconstruction algorithms. Since computer vision developers do not necessarily have parallel programming expertise, it is important to give them the tools and capabilities to naturally express computer vision algorithms, yet […]

CUDA

Jan, 11

Transform Coding for Hardware-accelerated Volume Rendering

Hardware-accelerated volume rendering using the GPU is now the standard approach for real-time volume rendering, although limited graphics memory can present a problem when rendering large volume data sets. Volumetric compression in which the decompression is coupled to rendering has been shown to be an effective solution to this problem; however, most existing techniques were […]

Jan, 11

GPU-Based Nonlinear Ray Tracing

In this paper, we present a mapping of nonlinear ray tracing to the GPU which avoids any data transfer back to main memory. The rendering process consists of the following parts: ray setup according to the camera parameters, ray integration, ray-object intersection, and local illumination. Bent rays are approximated by polygonal lines that are represented […]

Jan, 11

Automated image alignment for 2D gel electrophoresis in a high-throughput proteomics pipeline

MOTIVATION: The quest for high-throughput proteomics has revealed a number of challenges in recent years. Whilst substantial improvements in automated protein separation with liquid chromatography and mass spectrometry (LC/MS), aka ‘shotgun’ proteomics, have been achieved, large-scale open initiatives such as the Human Proteome Organization (HUPO) Brain Proteome Project have shown that maximal proteome coverage is […]

Jan, 11

Exploring new architectures in accelerating CFD for Air Force applications

Computational Fluid Dynamics (CFD) is an active field of research where the development of faster and more accurate methods is linked to the continuous demand for ever higher computational power. And indeed, for at least two decades, high-performance computing (HPC) programmers have taken for granted that each successive generation of microprocessors would, either immediately or […]

Jan, 11

An events based algorithm for distributing concurrent tasks on multi-core architectures

In this paper, a programming model is presented which enables scalable parallel performance on multi-core shared memory architectures. The model has been developed for application to a wide range of numerical simulation problems. Such problems involve time stepping or iteration algorithms where synchronization of multiple threads of execution is required. It is shown that traditional […]

Jan, 11

Diagnosis, Tuning, and Redesign for Multicore Performance: A Case Study of the Fast Multipole Method

Given a program and a multisocket, multicore system, what is the process by which one understands and improves its performance and scalability? We describe an approach in the context of improving within-node scalability of the fast multipole method (FMM). Our process consists of a systematic sequence of modeling, analysis, and tuning steps, beginning with simple […]

Jan, 11

Efficient gradient-domain compositing using quadtrees

We describe a hierarchical approach to improving the efficiency of gradient-domain compositing , a technique that constructs seamless composites by combining the gradients of images into a vector field that is then integrated to form a composite. While gradient-domain compositing is powerful and widely used, it suffers from poor scalability. Computing an n pixel composite […]

Jan, 11

SCELib3.0: The new revision of SCELib, the parallel computational library of molecular properties in the Single Center Approach

SCELib is a computer program which implements the Single Center Expansion (SCE) method to describe molecular electronic densities and the interaction potentials between a charged projectile (electron or positron) and a target molecular system. The first version (CPC Catalog identifier ADMG_v1_0) was submitted to the CPC Program Library in 2000, and version 2.0 (ADMG_v2_0) was […]

CUDA

Jan, 11

iNFAnt: NFA pattern matching on GPGPU devices

This paper presents iNFAnt, a parallel engine for regular expression pattern matching. In contrast with traditional approaches, iNFAnt adopts non-deterministic automata, allowing the compilation of very large and complex rule sets that are otherwise hard to treat. iNFAnt is explicitly designed and developed to run on graphical processing units that provide large amounts of concurrent […]

CUDA

Jan, 11

Accelerating linpack with CUDA on heterogenous clusters

This paper describes the use of CUDA to accelerate the Linpack benchmark on heterogenous clusters, where both CPUs and GPUs are used in synergy with minor or no modifications to the original source code. A host library intercepts the calls to DGEMM and DTRSM and executes them simultaneously on both GPUs and CPU cores. An […]

CUDA

high performance computing on graphics processing units: hgpu.org

Posts

Constraint-based LN-curves

Optimization of tele-immersion codes

Transform Coding for Hardware-accelerated Volume Rendering

GPU-Based Nonlinear Ray Tracing

Automated image alignment for 2D gel electrophoresis in a high-throughput proteomics pipeline

Exploring new architectures in accelerating CFD for Air Force applications

An events based algorithm for distributing concurrent tasks on multi-core architectures

Diagnosis, Tuning, and Redesign for Multicore Performance: A Case Study of the Fast Multipole Method

Efficient gradient-domain compositing using quadtrees

SCELib3.0: The new revision of SCELib, the parallel computational library of molecular properties in the Single Center Approach

iNFAnt: NFA pattern matching on GPGPU devices

Accelerating linpack with CUDA on heterogenous clusters

Recent source codes

CudaForge: An Agent Framework with Hardware Feedback for CUDA Kernel Optimization

LC Framework

pplx-garden: Perplexity open source garden for inference technology

Atlas CLI: Machine Learning (ML) Lifecycle & Transparency Manager

transformers_tvm: Implementation of Encoder Decoder transformer on TVM

OpScanner

INT v.s. FP: A framework to compare low-bit integer and float-point formats

AutoDock-GPU: AutoDock for GPUs and other accelerators

NCCLX: collective communication framework

Tutoring LLM into a Better CUDA Optimizer

Most viewed papers (last 30 days)