high performance computing on graphics processing units: hgpu.org

Posts

Apr, 26

Parallelized Local Volatility Estimation Using GP-GPU Hardware Acceleration

We introduce an inverse problem for the local volatility model in option pricing. We solve the problem using the Levenberg-Marquardt algorithm and use the notion of the Frechet derivative when calculating the Jacobian matrix. We analyze the existence of the Frechet derivative and its numerical computation. To reduce the computational time of the inverse problem, […]

Apr, 26

A GPU Based 3D Object Retrieval Approach Using Spatial Shape Information

In this paper, we present a novel 3D model alignment method by analyzing the voxels of 3D meshes and a visual similarity based 3D model matching and retrieving method using active tabu search. Firstly, each 3D model is voxelized and applied voxels based PCA transformation, then it is represented by six depth images which are […]

Apr, 26

Solving Parabolic Problems Using Multithread and GPU

Multi-core platform enters the territory of high performance computing (HPC). Moreover, the NVIDA GT200 has 240 cores and performs thousands upon thousands of threads simultaneously. The role of the Graphics Processing Units (GPU)accelerator has become more and more important for scientific computing and computational fluid dynamic (CFD) to obtain result quickly and efficiently. In this […]

CUDA

Apr, 26

Study on acceleration technique for two-dimensional FDTD algorithm based on GPU

The Parallel finite difference time domain (FDTD) algorithm is an important method to 1 enhance the speed in multiple data FDTD operation. The improvement of graphics processing unit (GPU) performance, especially the emergence of Computer Unit Device Architecture (CUDA), offers parallel FDTD method an efficient and simple solution. First of all, this paper explains parallel […]

CUDA

Apr, 26

Making Human Connectome Faster: GPU Acceleration of Brain Network Analysis

The research on complex Brain Networks plays a vital role in understanding the connectivity patterns of the human brain and disease-related alterations. Recent studies have suggested a noninvasive way to model and analyze human brain networks by using multi-modal imaging and graph theoretical approaches. Both the construction and analysis of the Brain Networks require tremendous […]

Apr, 26

Performance Analysis of a New Real-Time Elastographic Time Constant Estimator

New elastographic techniques such as poroelastography and viscoelasticity imaging aim at imaging the temporal mechanical behavior of tissues. These techniques usually involve the use of curve fitting methods being applied to noisy data to estimate new elastographic parameters. As of today, however, current elastographic implementations of poroelastography and viscoelasticity imaging methods are in general too […]

Apr, 26

Model-T: Rethinking the OS for terabit speeds

This paper presents Model-T, an OS network stack designed to scale to terabit rates through pipelined execution of micro operations. Model-T parallelizes execution on multicore chips and enforces lockstep processing to maximize shared L2 data cache (d-cache) hitrate. Executing all operations without hitting main memory more than once (if at all) is the key design […]

Apr, 26

Research on ATI-CAL for accelerating FBP reconstruction

Accelerating CT reconstruction algorithms with general purpose GPU has attracted plenty of attention in recent years. Many researchers have studied the techniques of implement CT reconstruction algorithms on different GPUs and different code development environment to explore their capability and performance of acceleration. This work is to investigate the performance of stream computing of filtered […]

Apr, 25

GPU accelerated fast FEM deformation simulation

In this paper we present a general FEM (finite element method) solution that enables fast dynamic deformation simulation on the newly available GPU (graphics processing unit) hardware with compute unified device architecture (CUDA) from NVIDIA. CUDA-enabled GPUs harness the power of 128 processors which allow data parallel computations. Compared to the previous GPGPU, it is […]

CUDA

Apr, 25

A GPU implementation for two MIMO-OFDM detectors

Two real-valued signal models based on selective spanning with fast enumeration (SSFE) and layered orthogonal lattice detector (LORD) algorithms are implemented on a Nvidia graphics processing unit (GPU). A 2×2 multiple-input multiple-output (MIMO) antenna system with 16-quadrature amplitude modulation (16-QAM) is assumed. The chosen level update vector for SSFE is based on computer simulation results […]

Apr, 25

Parallel 3D Finite Difference Time Domain Simulations on Graphics Processors with Cuda

Parallel Finite Difference Time Domain (FDTD) method has been explored over past few years because of the expensive computation needed for its application. And General Purpose Graphics Processing Units (GPGPU), especially Computer Unit Device Architecture (CUDA) model, has been offered an efficient and simple solution. This paper analyzes parallel FDTD method and CUDA architecture, presents […]

CUDA

Apr, 25

MultiGPU computing using MPI or OpenMP

The GPU computing follows the trend of GPGPU, driven by the innovations in both hardware and programming languages made available to nongraphic programmers. Since some problems require an important time to solve or data quantities that do not fit on one single GPU, the logical continuation was to make use of multiple GPUs. In order […]

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Parallelized Local Volatility Estimation Using GP-GPU Hardware Acceleration

A GPU Based 3D Object Retrieval Approach Using Spatial Shape Information

Solving Parabolic Problems Using Multithread and GPU

Study on acceleration technique for two-dimensional FDTD algorithm based on GPU

Making Human Connectome Faster: GPU Acceleration of Brain Network Analysis

Performance Analysis of a New Real-Time Elastographic Time Constant Estimator

Model-T: Rethinking the OS for terabit speeds

Research on ATI-CAL for accelerating FBP reconstruction

GPU accelerated fast FEM deformation simulation

A GPU implementation for two MIMO-OFDM detectors

Parallel 3D Finite Difference Time Domain Simulations on Graphics Processors with Cuda

MultiGPU computing using MPI or OpenMP

Recent source codes

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Most viewed papers (last 30 days)