high performance computing on graphics processing units: hgpu.org

Posts

Aug, 16

Efficient GPU implementation of parameter estimation of a statistical model for online advertisement optimization

The optimization problem of estimating parameters using a maximum a-posterior (MAP) [3] approach on a non-linear statistical model with a large data set can be solved using an L-BFGS [10] algorithm. When dealing with an ever changing reality, the evaluation need to be fast to capture the immediacy of the observations. This thesis will present […]

CUDA

Aug, 15

vCUDA Framework Development for GPU Virtualization

vCUDA is a middleware that allows an application to use a CUDA-compatible graphics processing unit (GPU) installed in a remote computer as if it were installed in the computer where the application is being executed. vCUDA is designed following the client-server distributed architecture. On one side, the client employs a library of wrappers to the […]

CUDA

Aug, 15

Real root isolation for univariate polynomials on GPUs and multicores

I participate to the elaboration of the library cumodp. My objective is to develop code for the exact calculation of the real roots of univariate polynomials. Stating this problem is very easy. However, as one dives into the details, one realizes that there are lots of challenges in order to reach highly efficient algorithmic and […]

CUDA

Aug, 15

GPU-Assisted Cryptography of Log-Structured Indices

General purpose programming of Graphics Processing Units (GPUs) is a relatively new technological advancement. GPUs contain vast amounts of computational power with their many core architectures. Within many computer systems the power of these GPUs often goes unused outside the realm of graphics. Many of today’s common computational tasks are well suited for the single […]

CUDA

Aug, 15

A New Cooperative Evolutionary Multi-Swarm Optimizer Algorithm Based on CUDA Parallel Architecture Applied to Solve Engineering Optimization Problems

This paper presents a new Cooperative Evolutionary MultiSwarm Optimization Algorithm (CEMSO-GPU) based on CUDA parallel architecture applied to solve engineering problems. The focus on this approach is: The use of the concept of master/slave swarm with a mechanism of sharing data; and, the parallelism method based on the paradigm of General Purpose Computing on Graphics […]

CUDA

Aug, 15

Optimizing the Computation of Eigenvalues Using Graphics Processing Units

In this paper, we first briefly describe some mathematical aspects regarding the computation of eigenvalues, followed by an original approach: a bisection algorithm useful in computing eigenvalues for a tridiagonal symmetric matrix of arbitrary size, using the computing capabilities of the latest graphics processing units that incorporate the Compute Unified Device Architecture. The novel approach […]

CUDA

Aug, 14

Performance of FORTRAN and C GPU Extensions for a Benchmark Suite of Fourier Pseudospectral Algorithms

A comparison of PGI OpenACC, FORTRAN CUDA, and Nvidia CUDA pseudospectral methods on a single GPU and GCC FORTRAN on single and multiple CPU cores is reported. The GPU implementations use CuFFT and the CPU implementations use FFTW. Porting pre-existing FORTRAN codes to utilize a GPUs is efficient and easy to implement with OpenACC and […]

CUDA

Aug, 14

Accelerating cellular automata simulations using AVX and CUDA

We investigated various methods of parallelization of the Frish-Hasslacher-Pomeau (FHP) cellular automata algorithm for modeling fluid flow. These methods include SSE, AVX, and POSIX Threads for central processing units (CPUs) and CUDA for graphics processing units (GPUs). We present implementation details of the FHP algorithm based on AVX/SSE and CUDA technologies. We found that (a) […]

CUDA

Aug, 14

Dynamic Warp Resizing in High-Performance SIMT

Modern GPUs synchronize threads grouped in a warp at every instruction. These results in improving SIMD efficiency and makes sharing fetch and decode resources possible. The number of threads included in each warp (or warp size) affects divergence, synchronization overhead and the efficiency of memory access coalescing. Small warps reduce the performance penalty associated with […]

Aug, 14

A Second-Order Distributed Trotter-Suzuki Solver with a Hybrid Kernel

The Trotter-Suzuki approximation leads to an efficient algorithm for solving the time-dependent Schroedinger equation. Using existing highly optimized CPU and GPU kernels, we developed a distributed version of the algorithm that runs efficiently on a cluster. Our implementation also improves single node performance, and is able to use multiple GPUs within a node. The scaling […]

CUDA

Aug, 14

A GPU implementation of the Simulated Annealing Heuristic for the Quadratic Assignment Problem

The quadratic assignment problem (QAP) is one of the most difficult combinatorial optimization problems. An effective heuristic for obtaining approximate solutions to the QAP is simulated annealing (SA). Here we describe an SA implementation for the QAP which runs on a graphics processing unit (GPU). GPUs are composed of low cost commodity graphics chips which […]

CUDA

Aug, 13

Orthorectification by Using GPGPU Method

Thanks to the nature of the graphics processing, the newly released products offer highly parallel processing units with high-memory bandwidth and computational power of more than teraflops per second. The modern GPUs are not only powerful graphic engines but also they are high level parallel programmable processors with very fast computing capabilities and high-memory bandwidth […]

CUDA

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Efficient GPU implementation of parameter estimation of a statistical model for online advertisement optimization

vCUDA Framework Development for GPU Virtualization

Real root isolation for univariate polynomials on GPUs and multicores

GPU-Assisted Cryptography of Log-Structured Indices

A New Cooperative Evolutionary Multi-Swarm Optimizer Algorithm Based on CUDA Parallel Architecture Applied to Solve Engineering Optimization Problems

Optimizing the Computation of Eigenvalues Using Graphics Processing Units

Performance of FORTRAN and C GPU Extensions for a Benchmark Suite of Fourier Pseudospectral Algorithms

Accelerating cellular automata simulations using AVX and CUDA

Dynamic Warp Resizing in High-Performance SIMT

A Second-Order Distributed Trotter-Suzuki Solver with a Hybrid Kernel

A GPU implementation of the Simulated Annealing Heuristic for the Quadratic Assignment Problem

Orthorectification by Using GPGPU Method

Recent source codes

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

PELSI: Power-Efficient Layer-Switched Inference

Ouroboros: Virtualized Queues for dynamic memory management

MSCCL++: A GPU-driven communication stack for scalable AI applications

Benchmark compute shader of Unity against InteropUnityCUDA

Most viewed papers (last 30 days)