high performance computing on graphics processing units: hgpu.org

Posts

Dec, 2

Massively Parallelized Monte Carlo Simulation and its Applications in Finance

In this paper, we propose, develop and implement a tool that increases the computational speed of exotic derivatives pricing at a fraction of the cost of traditional methods. Our paper focuses on investigating the computing efficiencies of GPU systems. We utilize the GPU’s natural parallelization capabilities to price financial instruments. We outline our implementation, solutions […]

Dec, 2

An error correction solver for linear systems: Evaluation of mixed precision implementations

This paper proposes an error correction method for solving linear systems of equations and the evaluation of an implementation using mixed precision techniques. While different technologies are available, graphic processing units (GPUs) have been established as particularly powerful coprocessors in recent years. For this reason, our error correction approach is focused on a CUDA implementation […]

CUDA

Dec, 2

Auto-optimization of a Feature Selection Algorithm

Advanced visualization algorithms are typically computationally expensive but highly data parallel which make them attractive candidates for GPU architectures. However, porting algorithms on a GPU still remains a challenging process. The Mint programming model addresses this issue with its simple and high level interface. It targets the users who seek real-time performance without investing in […]

CUDA

Dec, 2

Evaluation of Fermi Features for Data Mining Algorithms

A recent development in High Performance Computing is the availability of NVIDIA’s Fermi or the 20-series GPUs. These offer features such as inbuilt atomic double precision support and increased shared memory. This thesis focuses on optimizing and evaluating the new features offered by the Fermi series GPUs for data mining algorithms involving reductions. Using three […]

CUDA

Dec, 2

Implementation of the FDTD Method Based on Lorentz-Drude Dispersive Model on GPU for Plasmonics Applications

We present a three-dimensional finite difference time domain (FDTD) method on graphics processing unit (GPU) for plasmonics applications. For the simulation of plasmonics devices, the Lorentz-Drude (LD) dispersive model is incorporated into Maxwell equations, while the auxiliary differential equation (ADE) technique is applied to the LD model. Our numerical experiments based on typical domain sizes […]

CUDA

Dec, 2

Spotting Radio Transients with the help of GPUs

Exploration of the time-domain radio sky has huge potential for advancing our knowledge of the dynamic universe. Past surveys have discovered large numbers of pulsars, rotating radio transients and other transient radio phenomena; however, they have typically relied upon off-line processing to cope with the high data and processing rate. This paradigm rules out the […]

CUDA

Dec, 1

A programming language interface to describe transformations and code generation

This paper presents a programming language interface, a complete scripting language, to describe composable compiler transformations. These transformation programs can be written, shared and reused by non-expert application and library developers. From a compiler writer’s perspective, a scripting language interface permits rapid prototyping of compiler algorithms that can mix levels and compose different sequences of […]

CUDA

Dec, 1

GPU Acceleration of Solving Parabolic Partial Differential Equations Using Difference Equations

Parabolic partial differential equations are often used to model systems involving heat transfer, acoustics, and electrostatics. The need for more complex models with increasing precision drives greater computational demands from processors. Since solving these types of equations is inherently parallel, GPU computing offers an attractive solution for drastically decreasing time to completion, power usage, and […]

CUDA

Dec, 1

Scalable Data Clustering using GPU Clusters

The computational demands of multivariate clustering grow rapidly, and therefore processing large data sets, like those found in flow cytometry data, is very time consuming on a single CPU. Fortunately these techniques lend themselves naturally to large scale parallel processing. To address the computational demands, graphics processing units, specifically NVIDIA’s CUDA framework and Tesla architecture, […]

CUDA

Dec, 1

GPU Accelerated Numerical Solutions to Chaotic PDEs

In this study, chaotic partial differential equations (PDEs) were numerically solved using a parallel algorithm on graphics processing units (GPU). This new method will aid in our search for simple examples of chaotic PDEs. Computational time using the GPU was compared to other languages such as Matlab and PowerBASIC. The GPU algorithm was optimized using […]

Dec, 1

Iterative optimization methods for efficient image restoration on multicore architectures

This paper explores effective algorithms for the solution of numerical nonlinear optimization problems in image restoration. The technology of modern acquisition techniques and devices most often returns data of increasing size, so we focus on the Scaled Gradient Projection algorithm, which is well suited to large-scale applications. We present its parallel implementations on different hardware, […]

CUDA

Dec, 1

Evaluation iterative solver for pCDR on GPU accelerator

In the past few years, the graphics processing units (GPU) has become trend in high performance computing (HPC). The newest Top500 list was showed three supercomputers contain GPU accelerator on Top10 in Nov. 2010. The role of the GPU accelerator has become more and more important for scientific computing and computational fluid dynamic (CFD) to […]

CUDA

high performance computing on graphics processing units: hgpu.org

Posts

Massively Parallelized Monte Carlo Simulation and its Applications in Finance

An error correction solver for linear systems: Evaluation of mixed precision implementations

Auto-optimization of a Feature Selection Algorithm

Evaluation of Fermi Features for Data Mining Algorithms

Implementation of the FDTD Method Based on Lorentz-Drude Dispersive Model on GPU for Plasmonics Applications

Spotting Radio Transients with the help of GPUs

A programming language interface to describe transformations and code generation

GPU Acceleration of Solving Parabolic Partial Differential Equations Using Difference Equations

Scalable Data Clustering using GPU Clusters

GPU Accelerated Numerical Solutions to Chaotic PDEs

Iterative optimization methods for efficient image restoration on multicore architectures

Evaluation iterative solver for pCDR on GPU accelerator

Recent source codes

OpScanner

Atlas CLI: Machine Learning (ML) Lifecycle & Transparency Manager

transformers_tvm: Implementation of Encoder Decoder transformer on TVM

INT v.s. FP: A framework to compare low-bit integer and float-point formats

AutoDock-GPU: AutoDock for GPUs and other accelerators

NCCLX: collective communication framework

Tutoring LLM into a Better CUDA Optimizer

Adaptivity in AdaptiveCpp: Optimizing Performance by Leveraging Runtime Information During JIT-Compilation

Kernel Library for LLM Serving

Neptune: Advanced ML Operator Fusion for Locality and Parallelism on GPUs

Most viewed papers (last 30 days)