high performance computing on graphics processing units: hgpu.org

Posts

Oct, 9

Parallel and efficient Boolean on polygonal solids

We present a novel framework which can efficiently evaluate approximate Boolean set operations for B-rep models by highly parallel algorithms. This is achieved by taking axis-aligned surfels of Layered Depth Images (LDI) as a bridge and performing Boolean operations on the structured points. As compared with prior surfel-based approaches, this paper has much improvement. Firstly, […]

CUDA

Oct, 9

Molecular dynamics simulation of UO2 nanocrystals melting

In this article we study melting of uranium dioxide (UO2) nanocrystals (NC) isolated in vacuum (i.e. non-periodic boundary conditions) using molecular dynamics (MD) in the approximation of pair potentials and rigid ions. We calculate the size dependence of the temperature and heat of melting, the density jump for crystals of cubic shape and volumes up […]

CUDA

Oct, 9

Acceleration of computation speed for elastic wave simulation using a Graphic Processing Unit

Numerical simulation in exploration geophysics provides important insights into subsurface wave propagation phenomena. Although elastic wave simulations take longer to compute than acoustic simulations, an elastic simulator can construct more realistic wavefields including shear components. Therefore, it is suitable for exploration of the responses of elastic bodies. To overcome the long duration of the calculations, […]

CUDA

Oct, 8

Analysis of 3-dimensional electromagnetic fields in dispersive media using cuda

This research presents the implementation of the Finite-Difference Time-Domain (FDTD) method for the solution of 3-dimensional electromagnetic problems in dispersive media using Graphics Processor Units (GPUs). By using the newly introduced CUDA technology, we illustrate the efficacy of GPUs in accelerating the FDTD computations by achieving appreciable speedup factors with great ease and at no […]

CUDA

Oct, 8

Performance improvements for iterative electron tomography reconstruction using graphics processing units (GPUs)

Iterative reconstruction algorithms are becoming increasingly important in electron tomography of biological samples. These algorithms, however, impose major computational demands. Parallelization must be employed to maintain acceptable running times. Graphics Processing Units (GPUs) have been demonstrated to be highly cost-effective for carrying out these computations with a high degree of parallelism. In a recent paper […]

Oct, 8

Programming framework for clusters with heterogeneous accelerators

We describe a programming framework for high performance clusters with various hardware accelerators. In this framework, users can utilize the available heterogeneous resources productively and efficiently. The distributed application is highly modularized to support dynamic system configuration with changing types and number of the accelerators. Multiple layers of communication interface are introduced to reduce the […]

Oct, 8

Astrophysical particle simulations with large custom GPU clusters on three continents

We present direct astrophysical N-body simulations with up to six million bodies using our parallel MPI-CUDA code on large GPU clusters in Beijing, Berkeley, and Heidelberg, with different kinds of GPU hardware. The clusters are linked in the cooperation of ICCS (International Center for Computational Science). We reach about one third of the peak performance […]

Oct, 8

Efficient reconfigurable design for pricing asian options

Arithmetic Asian options are financial derivatives which have the feature of path-dependency: they depend on the entire price path of the underlying asset, rather than just the instantaneous price. This path-dependency makes them difficult to price, as only computationally intensive Monte-Carlo methods can provide accurate prices. This paper proposes an FPGA-accelerated Asian option pricing solution, […]

CUDA

Oct, 7

Multifrontal computations on GPUs and their multi-core hosts

The use of GPUs to accelerate the factoring of large sparse symmetric matrices shows the potential of yielding important benefits to a large group of widely used applications. This paper examines how a multifrontal sparse solver performs when exploiting both the GPU and its multi-core host. It demonstrates that the GPU can dramatically accelerate the […]

CUDA

Oct, 7

Non-recursive beam search on GPU for formal concept analysis

We document a parallel non-recursive beam search GPGPU FCA CbO like algorithm written in nVidia CUDA C and test it on software module dependency graphs. Despite removing repeated calculations and optimising data structures and kernels, we do not yet see major speed ups. Instead GeForce 295 GTX and Tesla C2050 report 141072 concepts (maximal rectangles, […]

CUDA

Oct, 7

Investigation on the Use of GPGPU for Fast Sparse Matrix Factorization

Solution for network equations is frequently encountered by power system researchers. With the increasingly larger system size, time consumed network solution is becoming a dominant factor in the overall time cost. One distinct and important feature of the network admittance matrix is that it is highly sparse, which need to be addressed by specialized computation […]

CUDA

Oct, 7

GPGPU-assisted prediction of ion binding sites in proteins

Prediction of binding sites for different types of ions in protein 3D structure context is a complex challenge for biophysical computational methods. One possible approach involves using empirical, also called as knowledge-based, potentials. In the current study, we present a new GPGPU program complex, PIONCA (Protein-ION CAlculator) for efficient generation of empirical potentials for protein-ion […]

CUDA

high performance computing on graphics processing units: hgpu.org

Posts

Parallel and efficient Boolean on polygonal solids

Molecular dynamics simulation of UO2 nanocrystals melting

Acceleration of computation speed for elastic wave simulation using a Graphic Processing Unit

Analysis of 3-dimensional electromagnetic fields in dispersive media using cuda

Performance improvements for iterative electron tomography reconstruction using graphics processing units (GPUs)

Programming framework for clusters with heterogeneous accelerators

Astrophysical particle simulations with large custom GPU clusters on three continents

Efficient reconfigurable design for pricing asian options

Multifrontal computations on GPUs and their multi-core hosts

Non-recursive beam search on GPU for formal concept analysis

Investigation on the Use of GPGPU for Fast Sparse Matrix Factorization

GPGPU-assisted prediction of ion binding sites in proteins

Recent source codes

OpScanner

Atlas CLI: Machine Learning (ML) Lifecycle & Transparency Manager

transformers_tvm: Implementation of Encoder Decoder transformer on TVM

INT v.s. FP: A framework to compare low-bit integer and float-point formats

AutoDock-GPU: AutoDock for GPUs and other accelerators

NCCLX: collective communication framework

Tutoring LLM into a Better CUDA Optimizer

Adaptivity in AdaptiveCpp: Optimizing Performance by Leveraging Runtime Information During JIT-Compilation

Kernel Library for LLM Serving

Neptune: Advanced ML Operator Fusion for Locality and Parallelism on GPUs

Most viewed papers (last 30 days)