high performance computing on graphics processing units: hgpu.org

Posts

Apr, 16

A High-Performance Multi-user Service System for Financial Analytics Based on Web Service and GPU Computation

In finance, securities, such as stocks, funds, warrants and bonds, are actively traded in financial markets. Abundance of market data and accurate pricing of a security can help the practitioners arbitrage or hedge their position. It can also help researhers and traders design better trading strategies. In this work, we develop a pricing and data/information […]

Apr, 16

Accurate Measurements and Precise Modeling of Power Dissipation of CUDA Kernels toward Power Optimized High Performance CPU-GPU Computing

Power dissipation is one of the most imminent limitation factors influencing the development of High Performance Computing (HPC). Toward power-efficient HPC on CPU-GPU hybrid platform, we are investigating software methodologies to achieve optimized power utilization by algorithm design and programming technique. In this paper we discuss power measurements of GPU, propose a method of automatic […]

CUDA

Apr, 16

Accelerating Particle Swarm Algorithm with GPGPU

This paper focuses on solving large size optimization problems using GPGPU. Evolutionary Algorithms for solving these optimization problems suffer from the curse of dimensionality, which implies that their performance deteriorates as quickly as the dimensionality of the search space increases. This difficulty makes very challenging the performance studies for very high dimensional problems. Furthermore, these […]

Apr, 15

N-body Simulation for Astronomical Collisional Systems with a New SIMD Instruction Set Extension to the x86 Architecture, Advanced Vector Extensions

We present a high-performance N-body code for astronomical collisional systems accelerated with the aid of a new SIMD instruction set extension of the x86 architecture: Advanced Vector eXtensions (AVX), an enhanced version of the Streaming SIMD Extensions (SSE). With one processor core of Intel Core i7-2600 processor (8MB cache and 3.40 GHz) based on Sandy […]

Apr, 15

Parallel implementation of a Quantization algorithm for pricing American style options on GPGPU

The Quantization Tree algorithm has proven to be quite an efficient tool for the evaluation of financial derivatives with non-vanilla exercise rights as American-, Bermudan-or Swing options. Nevertheless, it relies heavily on a fast computation of the transition probabilities in the underlying Quantization Tree. Since this estimation is typically done by Monte-Carlo simulations, it is […]

CUDA

Apr, 15

Emerging technology about GPGPU

By a rapid development of graphics processing unit (GPU), the programmability and highly parallel processing feature of GPU create a chance to allow the general purpose computation to be conducted on GPU, conventionally called GPGPU (general purpose computation on GPU). A brief survey, in particular on the rationale of how the GPU architecture leads to […]

Apr, 15

GPU-accelerated 3D Bayesian image reconstruction from Compton scattered data

This paper describes the development of fast Bayesian reconstruction methods for Compton cameras using commodity graphics hardware. For fast iterative reconstruction, not only is it important to increase the convergence rate, but also it is equally important to accelerate the computation of time-consuming and repeated operations, such as projection and backprojection. Since the size of […]

Apr, 15

A distributed multi-GPU system for high speed electron microscopic tomographic reconstruction

Full resolution electron microscopic tomographic (EMT) reconstruction of large-scale tilt series requires significant computing power. The desire to perform multiple cycles of iterative reconstruction and realignment dramatically increases the pressing need to improve reconstruction performance. This has motivated us to develop a distributed multi-GPU (graphics processing unit) system to provide the required computing power for […]

CUDA

Apr, 15

MVAPICH2-GPU: optimized GPU to GPU communication for InfiniBand clusters

Data parallel architectures, such as General Purpose Graphics Units (GPGPUs) have seen a tremendous rise in their application for High End Computing. However, data movement in and out of GPGPUs remain the biggest hurdle to overall performance and programmer productivity. Applications executing on a cluster with GPUs have to manage data movement using CUDA in […]

CUDA

Apr, 15

GPU-Based Implementations of the Noniterative Regularized-CCSD(T) Corrections: Applications to Strongly Correlated Systems

The details of the graphical processing unit (GPU) implementation of the most computationally intensive (T)-part of the recently introduced regularized CCSD(T) (Reg-CCSD(T)) method [ Kowalski, K. ; Valiev, M. J. Chem. Phys. 2009, 131 , 234107 ] for calculating electronic energies of strongly correlated systems are discussed. Parallel tests performed for several molecular systems show […]

Apr, 15

Implementation of Jacobi iterative method on graphics processor unit

CUDA is a new computing architecture introduced by NVIDIA Corporation, aiming at general purpose computation on GPU. The architecture has strong compute power in the compute-intensive applications and data-intensive applications, so in recent years, how the framework is applied to the scientific computing has become a hot research. The iterative method for solving systems of […]

CUDA

Apr, 15

Parallel On-Chip Power Distribution Network Analysis on Multi-Core-Multi-GPU Platforms

The challenging task of analyzing on-chip power (ground) distribution networks with multimillion node complexity and beyond is key to today’s large chip designs. For the first time, we show how to exploit recent massively parallel single-instruction multiple-thread (SIMT)-based graphics processing unit (GPU) platforms to tackle large-scale power grid analysis with promising performance. Several key enablers […]

high performance computing on graphics processing units: hgpu.org

Posts

A High-Performance Multi-user Service System for Financial Analytics Based on Web Service and GPU Computation

Accurate Measurements and Precise Modeling of Power Dissipation of CUDA Kernels toward Power Optimized High Performance CPU-GPU Computing

Accelerating Particle Swarm Algorithm with GPGPU

N-body Simulation for Astronomical Collisional Systems with a New SIMD Instruction Set Extension to the x86 Architecture, Advanced Vector Extensions

Parallel implementation of a Quantization algorithm for pricing American style options on GPGPU

Emerging technology about GPGPU

GPU-accelerated 3D Bayesian image reconstruction from Compton scattered data

A distributed multi-GPU system for high speed electron microscopic tomographic reconstruction

MVAPICH2-GPU: optimized GPU to GPU communication for InfiniBand clusters

GPU-Based Implementations of the Noniterative Regularized-CCSD(T) Corrections: Applications to Strongly Correlated Systems

Implementation of Jacobi iterative method on graphics processor unit

Parallel On-Chip Power Distribution Network Analysis on Multi-Core-Multi-GPU Platforms

Recent source codes

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)