high performance computing on graphics processing units: hgpu.org

Posts

Jan, 13

Fast Greeks: Case of Credit Valuation Adjustments

(Counterparty) Credit Valuation Adjustments (CVA) has become a prevailing form of pricing default risk on over-the-counter (OTC) contracts. Due to the large size of portfolios included in the CVA calculation and its computational complexity, large computing grids are needed for the evaluation. The main purpose of this thesis is to investigate an even more computationally […]

CUDA

Jan, 12

Enhancing Data Parallelism for Ant Colony Optimisation on GPUs

Graphics Processing Units (GPUs) have evolved into highly parallel and fully programmable architectures over the past five years, and the advent of CUDA has facilitated their application to many real-world applications. In this paper, we deal with a GPU implementation of Ant Colony Optimisation (ACO), a population-based optimisation method which comprises two major stages: Tour […]

CUDA

Jan, 12

GKLEE: Concolic Verification and Test Generation for GPUs

Programs written for GPUs often contain correctness errors such as races, deadlocks, or may compute the wrong result. Existing debugging tools often miss these errors because of their limited input-space and execution-space exploration. Existing tools based on conservative static analysis or conservative modeling of SIMD concurrency generate false alarms resulting in wasted bug-hunting. They also […]

CUDA

Jan, 12

Incomplete-LU and Cholesky Preconditioned Iterative Methods Using CUSPARSE and CUBLAS

In this white paper we show how to use the CUSPARSE and CUBLAS libraries to achieve a 2x speedup over CPU in the incomplete-LU and Cholesky preconditioned iterative methods. We focus on the Bi-Conjugate Gradient Stabilized and Conjugate Gradient iterative methods, that can be used to solve large sparse nonsymmetric and symmetric positive definite linear […]

CUDA

Jan, 11

Optimization of power consumption in the iterative solution of sparse linear systems on graphics processors

In this paper, we analyze the power consumption of different GPU-accelerated iterative solver implementations enhanced with energy-saving techniques. Specifically, while conducting kernel calls on the graphics accelerator, we manually set the host system to a power-efficient idle-wait status so as to leverage dynamic voltage and frequency control. While the usage of iterative refinement combined with […]

CUDA

Jan, 11

A parallel Genetic Programming algorithm for classification

In this paper a Grammar Guided Genetic Programming based method for the learning of rule-based classification systems is proposed. The method learns disjunctive normal form rules generated by means of a context-free grammar. The individual constitutes a rule based decision list that represents the full classifier. To overcome the problem of computational time of this […]

CUDA

Jan, 11

Analysis and optimization of power consumption in the iterative solution of sparse linear systems on multi-core and many-core platforms

Energy efficiency is a major concern in modern high-performance-computing. Still, few studies provide a deep insight into the power consumption of scientific applications. Especially for algorithms running on hybrid platforms equipped with hardware accelerators, like graphics processors, a detailed energy analysis is essential to identify the most costly parts, and to evaluate possible improvement strategies. […]

CUDA

Jan, 11

Exponential integrators on graphic processing units

From the standpoint of a computer engineer there are (at least) two ways to improve the execution time of an algorithm. First, one might build sequential processing units with increased speed (this is most common in CPUs, although those have also incorporated parallel processing paradigms), while the second alternative is to build a massive number […]

CUDA

Jan, 11

Efficient Ray Tracing of Dynamic Scenes on the GPU

The topic of this thesis is ray tracing dynamic scenes and doing that efficiently while harnessing the massive computational power of today’s graphics cards. It is motivated by the ever increasing interest in raytracing and global illumination for creating effects in movies, but also the increased usage of 2D and 3D ray tracing in modern […]

CUDA

Jan, 11

Implementing Genetic Algorithms to CUDA Environment Using Data Parallelization

Computation methods of parallel problem solving using graphic processing units (GPUs) have attracted much research interests in recent years. Parallel computation can be applied to genetic algorithms (GAs) in terms of the evaluation process of individuals in a population. This paper describes yet another implementation method of GAs to the CUDA environment where CUDA is […]

CUDA

Jan, 11

Parallel LDPC decoding using CUDA and OpenMP

Digital mobile communication technologies, such as next generation mobile communication and mobile TV, are rapidly advancing. Hardware designs to provide baseband processing of new protocol standards are being actively attempted, because of concurrently emerging multiple standards and diverse needs on device functions, hardware-only implementation may have reached a limit. To overcome this challenge, digital communication […]

CUDA

Jan, 11

Efficient Model-based 3D Tracking of Hand Articulations using Kinect

We present a novel solution to the problem of recovering and tracking the 3D position, orientation and full articulation of a human hand from markerless visual observations obtained by a Kinect sensor. We treat this as an optimization problem, seeking for the hand model parameters that minimize the discrepancy between the appearance and 3D structure […]

high performance computing on graphics processing units: hgpu.org

Posts

Fast Greeks: Case of Credit Valuation Adjustments

Enhancing Data Parallelism for Ant Colony Optimisation on GPUs

GKLEE: Concolic Verification and Test Generation for GPUs

Incomplete-LU and Cholesky Preconditioned Iterative Methods Using CUSPARSE and CUBLAS

Optimization of power consumption in the iterative solution of sparse linear systems on graphics processors

A parallel Genetic Programming algorithm for classification

Analysis and optimization of power consumption in the iterative solution of sparse linear systems on multi-core and many-core platforms

Exponential integrators on graphic processing units

Efficient Ray Tracing of Dynamic Scenes on the GPU

Implementing Genetic Algorithms to CUDA Environment Using Data Parallelization

Parallel LDPC decoding using CUDA and OpenMP

Efficient Model-based 3D Tracking of Hand Articulations using Kinect

Recent source codes

OpScanner

Atlas CLI: Machine Learning (ML) Lifecycle & Transparency Manager

transformers_tvm: Implementation of Encoder Decoder transformer on TVM

INT v.s. FP: A framework to compare low-bit integer and float-point formats

AutoDock-GPU: AutoDock for GPUs and other accelerators

NCCLX: collective communication framework

Tutoring LLM into a Better CUDA Optimizer

Adaptivity in AdaptiveCpp: Optimizing Performance by Leveraging Runtime Information During JIT-Compilation

Kernel Library for LLM Serving

Neptune: Advanced ML Operator Fusion for Locality and Parallelism on GPUs

Most viewed papers (last 30 days)