high performance computing on graphics processing units: hgpu.org

Posts

Sep, 5

Research on double negative materials by using FDTD method based on GPUs

In recent years, the finite difference time domain (FDTD) method has been prevailed in the simulation of metamaterials widely. As the FDTD method can be suitable for the parallel computing, we apply this method to the Fermi-architecture Graphic Process Units (GPUs) to calculate the electromagnetic simulation of double negative materials in this paper. Finally, both […]

CUDA

Sep, 5

HISQ inverter on Intel Xeon Phi and NVIDIA GPUs

The runtime of a Lattice QCD simulation is dominated by a small kernel, which calculates the product of a vector by a sparse matrix known as the "Dslash" operator. Therefore, this kernel is frequently optimized for various HPC architectures. In this contribution we compare the performance of the Intel Xeon Phi to current Kepler-based NVIDIA […]

Sep, 4

Analysis of KECCAK Tree Hashing on GPU Architectures

In an effort to provide security and data integrity, hashing algorithms have been designed to consume an input of any length to produce a fixed length output. KECCAK was selected by NIST to become the next Secure Hashing Algorithm SHA-3) after nearly five years of competition. In addition to providing a sequential operating mode, there […]

CUDA

Sep, 4

Acceleration of stereo-matching on multi-core CPU and GPU

This paper presents an accelerated version of a dense stereo-correspondence algorithm for two different parallelism enabled architectures, multi-core CPU and GPU. The algorithm is part of the vision system developed for a binocular robot-head in the context of the CloPeMa 1 research project. This research project focuses on the conception of a new clothes folding […]

CUDA

Sep, 4

A uniform approach for programming distributed heterogeneous computing systems

Large-scale compute clusters of heterogeneous nodes equipped with multi-core CPUs and GPUs are getting increasingly popular in the scientific community. However, such systems require a combination of different programming paradigms making application development very challenging. In this article we introduce libWater, a library-based extension of the OpenCL programming model that simplifies the development of heterogeneous […]

OpenCL

Sep, 4

Solving 3D incompressible Navier-Stokes equations on hybrid CPU/GPU systems

This paper describes a hybrid multicore/GPU solver for the incompressible Navier-Stokes equations with constant coefficients, discretized by the finite difference method. By applying the prediction-projection method, the Navier-Stokes equations are transformed into a combination of Helmholtzlike and Poisson equations for which we describe efficient solvers. As an extension of our previous paper [1], this paper […]

CUDA

Sep, 4

CUDA method for the FDTD simulation by GPU

The technology of computational devices has been developed over several decades especially graphic processors which not only deal with graphic works but also compute scientific problems. This processor is suitable for parallel computations instead of using expensive high-end devices. Many research groups have implemented parallel computations using the MPI method with multi CPUs to solve […]

CUDA

Sep, 3

New efficient integral algorithms for quantum chemistry

The contents of this thesis are centered in the developement of new efficient algorithms for molecular integral evaluation in quantum chemistry, as well as new design and implementation strategies for such algorithms aimed at maximizing their performance and the utilization of modern hardware. This thesis introduces the K4+MIRROR algorithm for 2-electron repulsion integrals, a new […]

CUDA

Sep, 3

GPU-accelerated Database Systems: Survey and Open Challenges

The vast amount of processing power and memory bandwidth provided by modern graphics cards make them an interesting platform for data-intensive applications. Unsurprisingly, the database research community identified GPUs as effective co-processors for data processing several years ago. In the past years, there were many approaches to make use of GPUs at different levels of […]

CUDA

•

OpenCL

Sep, 3

Detection of retransmissions in 10G Ethernet using GPUs

Traffic analysis is an essential part of capacity planning, quality of service assurance and reinforcement of security in current telecommunication networks. As the network speed increases so does the traffic volume and the analysis of large traffic traces is computationally intensive. This document presents a flow extraction software that allows obtaining TCP flow records at […]

CUDA

Sep, 3

Searching for a counterexample of Kurepa’s Conjecture

Kurepa’s conjecture states that there is no odd prime p which divides !p=0!+1!+…+(p-1)!. We search for a counterexample of this conjecture for all p<10^10. We introduce new optimization techniques and perform the computation using graphics processing units (GPUs). Additionally, we consider the generalized Kurepa’s left factorial given as !kn=(0!)k+(1!)k+…+((n-1)!)k and show that for all integers […]

OpenCL

Sep, 3

Performance Portability Study of Linear Algebra Kernels in OpenCL

The performance portability of OpenCL kernel implementations for common memory bandwidth limited linear algebra operations across different hardware generations of the same vendor as well as across vendors is studied. Certain combinations of kernel implementations and work sizes are found to exhibit good performance across compute kernels, hardware generations, and, to a lesser degree, vendors. […]

OpenCL

high performance computing on graphics processing units: hgpu.org

Posts

Research on double negative materials by using FDTD method based on GPUs

HISQ inverter on Intel Xeon Phi and NVIDIA GPUs

Analysis of KECCAK Tree Hashing on GPU Architectures

Acceleration of stereo-matching on multi-core CPU and GPU

A uniform approach for programming distributed heterogeneous computing systems

Solving 3D incompressible Navier-Stokes equations on hybrid CPU/GPU systems

CUDA method for the FDTD simulation by GPU

New efficient integral algorithms for quantum chemistry

GPU-accelerated Database Systems: Survey and Open Challenges

Detection of retransmissions in 10G Ethernet using GPUs

Searching for a counterexample of Kurepa’s Conjecture

Performance Portability Study of Linear Algebra Kernels in OpenCL

Recent source codes

OpScanner

Atlas CLI: Machine Learning (ML) Lifecycle & Transparency Manager

transformers_tvm: Implementation of Encoder Decoder transformer on TVM

INT v.s. FP: A framework to compare low-bit integer and float-point formats

AutoDock-GPU: AutoDock for GPUs and other accelerators

NCCLX: collective communication framework

Tutoring LLM into a Better CUDA Optimizer

Adaptivity in AdaptiveCpp: Optimizing Performance by Leveraging Runtime Information During JIT-Compilation

Kernel Library for LLM Serving

Neptune: Advanced ML Operator Fusion for Locality and Parallelism on GPUs

Most viewed papers (last 30 days)