high performance computing on graphics processing units: hgpu.org

Posts

Nov, 5

Power Flow Analysis on CUDA-based GPU

This major qualifying project investigates the algorithm and the performance of using the CUDA-based Graphics Processing Unit for power flow analysis. The accomplished work includes the design, implementation and testing of the power flow solver. Comprehensive analysis shows that the execution time of the parallel algorithm outperforms that of the sequential algorithm by several factors.

CUDA

Nov, 5

Real-time Flame Rendering with GPU and CUDA

This paper proposes a method of flame simulation based on Lagrange process and chemical composition, which was non-grid and the problems associated with there grids were overcome. The turbulence movement of flame was described by Lagrange process and chemical composition was added into flame simulation which increased the authenticity of flame. For real-time applications, this […]

CUDA

•

OpenGL

Nov, 4

Accelerating a TV based JPEG decompression algorithm with Cuda

In previous works, we have have developed a mathematical model for artifact-free decompression of JPEG images. There, the problem of finding an artifact-free decompression for a given JPEG compressed image is related to a convex minimization problem. We use a primal-dual algorithm to solve this problem, for which we have developed a Matlab and C++ […]

CUDA

Nov, 4

A CUDA Implementation of Independent Component Analysis in the Time-Frequency Domain

For the blind separation of convolutive mixtures, a huge processing power is required. In this paper we propose a massive parallel implementation of the Independent Component Analysis in the time-frequency domain using the processing power of the current graphics adapters within the CUDA framework. The often used approach for solving the separation task is the […]

CUDA

Nov, 4

Parallelization of maximum likelihood fits with OpenMP and CUDA

Data analyses based on maximum likelihood fits are commonly used in the high energy physics community for fitting statistical models to data samples. This technique requires the numerical minimization of the negative log-likelihood function. MINUIT is the most common package used for this purpose in the high energy physics community. The main algorithm in this […]

CUDA

Nov, 4

Combined acoustic and optical trapping

Combining several methods for contact free micro-manipulation of small particles such as cells or micro-organisms provides the advantages of each method in a single setup. Optical tweezers, which employ focused laser beams, offer very precise and selective handling of single particles. On the other hand, acoustic trapping with wavelengths of about 1 mm allows the […]

Nov, 4

PEPSC: A Power-Efficient Processor for Scientific Computing

The rapid advancements in the computational capabilities of the graphics processing unit (GPU) as well as the deployment of general programming models for these devices have made the vision of a desktop supercomputer a reality. It is now possible to assemble a system that provides several TFLOPs of performance on scientific applications for the cost […]

Nov, 4

Gyrofluid Modeling of Turbulent, Kinetic Physics

Gyrofluid models to describe plasma turbulence combine the advantages of fluid models, such as lower dimensionality and well-developed intuition, with those of gyrokinetics models, such as finite Larmor radius (FLR) effects. This allows gyrofluid models to be more tractable computationally while still capturing much of the physics related to the FLR of the particles. We […]

CUDA

Nov, 4

Semi-Global Matching-Motivation, Developments and Applications

Since its original publication, the Semi-Global Matching (SGM) technique has been re-implemented by many researchers and companies. The method offers a very good trade off between runtime and accuracy, especially at object borders and fine structures. It is also robust against radiometric differences and not sensitive to the choice of parameters. Therefore, it is well […]

OpenCL

•

OpenGL

Nov, 4

Inter-cluster communication on clustered SIMD architectures

This work envisions that in the near future, GPUlike architectures will find their way to embedded systems. Accompanied by a small RISC control core, they will not merely be a hardware accelerator, but the heart of the system itself. Taking a state-of-the-art GPU, a baseline architecture is constructed with the embedded context in mind. Next, […]

Nov, 4

VASP on a GPU: application to exact-exchange calculations of the stability of elemental boron

General purpose graphical processing units (GPU’s) offer high processing speeds for certain classes of highly parallelizable computations, such as matrix operations and Fourier transforms, that lie at the heart of first-principles electronic structure calculations. Inclusion of exact-exchange increases the cost of density functional theory by orders of magnitude, motivating the use of GPU’s. Porting the […]

CUDA

Nov, 4

Computing Optimal Cycle Mean in Parallel on CUDA

Computation of optimal cycle mean in a directed weighted graph has many applications in program analysis, performance verification in particular. In this paper we propose a data-parallel algorithmic solution to the problem and show how the computation of optimal cycle mean can be efficiently accelerated by means of CUDA technology. We show how the problem […]

CUDA

high performance computing on graphics processing units: hgpu.org

Posts

Power Flow Analysis on CUDA-based GPU

Real-time Flame Rendering with GPU and CUDA

Accelerating a TV based JPEG decompression algorithm with Cuda

A CUDA Implementation of Independent Component Analysis in the Time-Frequency Domain

Parallelization of maximum likelihood fits with OpenMP and CUDA

Combined acoustic and optical trapping

PEPSC: A Power-Efficient Processor for Scientific Computing

Gyrofluid Modeling of Turbulent, Kinetic Physics

Semi-Global Matching-Motivation, Developments and Applications

Inter-cluster communication on clustered SIMD architectures

VASP on a GPU: application to exact-exchange calculations of the stability of elemental boron

Computing Optimal Cycle Mean in Parallel on CUDA

Recent source codes

OpScanner

Atlas CLI: Machine Learning (ML) Lifecycle & Transparency Manager

transformers_tvm: Implementation of Encoder Decoder transformer on TVM

INT v.s. FP: A framework to compare low-bit integer and float-point formats

AutoDock-GPU: AutoDock for GPUs and other accelerators

NCCLX: collective communication framework

Tutoring LLM into a Better CUDA Optimizer

Adaptivity in AdaptiveCpp: Optimizing Performance by Leveraging Runtime Information During JIT-Compilation

Kernel Library for LLM Serving

Neptune: Advanced ML Operator Fusion for Locality and Parallelism on GPUs

Most viewed papers (last 30 days)