high performance computing on graphics processing units: hgpu.org

Posts

Aug, 18

Permutation Index and GPU to Solve efficiently Many Queries

Similarity search is a fundamental operation for applications that deal with multimedia data. For a query in a multimedia database it is meaningless to look for elements exactly equal to a given one as query. Instead, we need to measure the similarity (or dissimilarity) between the query object and each object of the database. The […]

CUDA

Aug, 18

Encrypting video streams using OpenCL code on-demand

The amount of multimedia information transmitted through the web is very high and increasing. Generally, this kind data is not correctly protected, since users do not appreciate the information that images and videos may contain. In this work, we present an architecture for managing safely multimedia transmission channels. The idea is to encrypt and encode […]

OpenCL

Aug, 18

Fast and Flexible: Parallel Packet Processing with GPUs and Click

We introduce Snap, a framework for packet processing that outperforms traditional software routers by exploiting the parallelism available on modern GPUs. While obtaining high performance, it remains extremely flexible, with packet-processing tasks implemented as simple modular elements that are composed to build fully functional routers and switches. Snap is based on the Click modular router, […]

CUDA

Aug, 17

Solving 3D viscous incompressible Navier-Stokes equations using CUDA

A CUDA implementation of the 3D viscous incompressible Navier-Stokes equations is proposed using as advection operator the BFECC (Back and Forth Error Compensation and Correction) scheme. The Poisson problem for pressure is solved with a CG (Conjugated Gradient) preconditioning the system with FFTs (Fast Fourier Transforms). Study cases such as Lid-Driven Cavity and Flow Past […]

CUDA

Aug, 17

Performance Analysis of a Symmetric Cryptography Algorithm on GPU and GPU Cluster

This article presents a performance analysis of the symmetric encryption algorithm AES (Advanced Encryption Standard) on a machine with one GPU and a cluster of GPUs, for cases in which the memory required by the algorithm is more than that of a GPU. Two implementations were carried out, based on C language, that use the […]

CUDA

Aug, 17

Formal specification and verification of OpenCL Kernel optimization

Computing general problems using the graphical processing unit (GPU) of a device is an emerging field. The parallel structure of the GPU allows for massive concurrency, when executing a program. Therefore, by executing (a part of) the code on the GPU, a previously unused resource can be used, to achieve a speed-up of an application. […]

OpenCL

Aug, 17

Acceleration of Feynman loop integrals in high-energy physics on many core GPUs

The current and future colliders in high-energy physics require theorists to carry out a large scale computation for a precise comparison between experimental results and theoretical ones. In a perturbative approach several methods to evaluate Feynman loop integrals which appear in the theoretical calculation of cross-sections are well established in the one-loop level, however, more […]

OpenCL

Aug, 17

Studying the core-cusp problem in cold dark matter halos using N-body simulations on GPU clusters

The discrepancy in the mass-density profile of dark matter halos between simulations and observations, the core-cusp problem, is a long-standing open question in the standard paradigm of cold dark matter cosmology. Here, we study the dynamical response of dark matter halos to oscillations of the galactic potential which are induced by a cycle of gas […]

CUDA

Aug, 16

Accelerating Random Forests on CPUs and GPUs for Object-Class Image Segmentation

Random forests are a machine learning method that has recently become popular in the computer vision community to solve image segmentation and object detection tasks. Existing random forest implementations are either general purpose and not efficiently applicable for image segmentation or focus only on the speed of prediction. The implementation for the Microsoft Kinect gaming […]

CUDA

Aug, 16

GPU-Accelerated Scalable Solver for Banded Linear Systems

Solving a banded linear system efficiently is important to many scientific and engineering applications. Current solvers achieve good scalability only on the linear systems that can be partitioned into independent subsystems. In this paper, we present a GPU based, scalable Bi-Conjugate Gradient Stabilized solver that can be used to solve a wide range of banded […]

CUDA

Aug, 16

Lossless LZW Data Compression Algorithm on CUDA

Data compression is an important area of information and communication technologies it seeks to reduce the number of bits used to store or transmit information. It will efficiently utilizes the memory spaces and allows to transmit data within a limited bandwidth. Most compression process is achieved by removing data redundancy while preserving information content. Data […]

CUDA

Aug, 16

Towards Path Tracing in Games

We investigate GPU path tracing performance in the context of real-time rendering for games. We propose a reformulation of Russian roulette, as well as an efficient implementation of the path regeneration algorithm by Novak et al. [Novak et al. 2010]. We show that a combination of these algorithms provides high performance for a variety of […]

CUDA

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Permutation Index and GPU to Solve efficiently Many Queries

Encrypting video streams using OpenCL code on-demand

Fast and Flexible: Parallel Packet Processing with GPUs and Click

Solving 3D viscous incompressible Navier-Stokes equations using CUDA

Performance Analysis of a Symmetric Cryptography Algorithm on GPU and GPU Cluster

Formal specification and verification of OpenCL Kernel optimization

Acceleration of Feynman loop integrals in high-energy physics on many core GPUs

Studying the core-cusp problem in cold dark matter halos using N-body simulations on GPU clusters

Accelerating Random Forests on CPUs and GPUs for Object-Class Image Segmentation

GPU-Accelerated Scalable Solver for Banded Linear Systems

Lossless LZW Data Compression Algorithm on CUDA

Towards Path Tracing in Games

Recent source codes

Specx: Speculative task-based runtime system

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

KISim: Kubernetes Intelligent Scheduling Simulator

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

Most viewed papers (last 30 days)