high performance computing on graphics processing units: hgpu.org

Posts

Aug, 5

An Ultra-Fast, Optimized and Massively-Parallelized Curvelet Transform Algorithm on GP-GPUs

The Curvelet transform is among one of the most powerful time-frequency representations of an image. However, since it is not a fast algorithm it cannot be employed in most real-time and/or large scale applications. This paper proposes a novel algorithm to speed up the Curvelet transform by both optimizing it for repetitive Curvelet usage and […]

CUDA

Aug, 5

Attack Signature Matching using Graphics Processors in High-Performance Intrusion Detection Systems

Network Intrusion Detection Systems (NIDS) which should perform time-consuming evaluation of every packet received from network have faced throughput challenge as a result of the increase in the speed of network communications and the high volume of Internet threats. In an NIDS, the most important and time-consuming processes are pattern matching and deep inspection of […]

CUDA

Aug, 5

Simulating a Family of Tissue P Systems Solving SAT on the GPU

In order to provide efficient software tools to deal with large membrane systems, high-throughput simulators are required. Parallel computing platforms are good candidates, since they are capable of partially implementing the inherently parallel nature of the model. In this concern, today GPUs (Graphics Processing Unit) are considered as highly parallel processors, and they are being […]

CUDA

Aug, 5

Ray Tracing in the Cloud using MapReduce

We present the Hadoop Online Ray Tracer (HORT), a scalable ray tracing framework for general, pay-as-you-go, cloud computing services. Using MapReduce, HORT partitions the computational workload and scene data differently than other distributed memory ray tracing frameworks. We show that this unique partitioning significantly bounds the data replication costs and inter-process communication. Consequently HORT is […]

CUDA

Aug, 3

AQUAgpusph, a free 3D SPH solver accelerated with OpenCL

In this paper AQUAgpusph, a new free SPH software licensed under GPLv3 and accelerated using OpenCL, will be described. Its main differences with respect to other GPU based SPH implementations will be discussed, focusing first on the fact that is accelerated with OpenCL, second on the wide range of solid boundary condition enforcing methods have […]

OpenCL

Aug, 3

Strategies for Optimization of Parallel Programs

Multi-core processors are present in most forms of computing, from a pocket-size smartphone to supercomputers. Consequently, parallel and concurrent programming has reemerged as a pressing concern for everyone interested in exploring all the potential computational power in these machines. Writing parallel, and specially concurrent, programs is not a trivial task as it requires a different […]

CUDA

•

OpenCL

Aug, 2

Real-Time Electroholography Using a Multi-GPU Environmental PC

We report a real-time electroholography using compact system composed of a multi-GPU environmental PC with four GPUs of Kepler architecture. Finally, our system can calculate 1,920×1,024 pixel CGH from the 3D object composed of 10,240 points in 40.3ms.

CUDA

•

OpenGL

Aug, 2

DRiVE: An Example of Distributed Rendering in Virtual Environments

Most Virtual Reality (VR) applications use rendering methods which implement local illumination models, simulating only direct interaction of light with 3D objects. They do not take into account the energy exchange between the objects themselves, making the resulting images look non-optimal. The main reason for this is the simulation of global illumination having a high […]

CUDA

Aug, 2

Large-Scale Sound Field Rendering in Rectangular Room with Specular Reflection

The sound field rendering is a technique to compute the sound field from the three-dimensional numerical models constructed in the computer, and it is the same concept as the graphics rendering in the computer graphics. In this paper, a GPU (Graphics Processing Unit) cluster system is applied to the sound field rendering for a large […]

CUDA

Aug, 2

Comparing CUDA, OpenCL and OpenGL Implementations of the Cardiac Monodomain Equations

Computer simulations of cardiac electrophysiology are a helpful tool in the study of bioelectric activity of the heart. The cardiac monodomain model comprises a nonlinear system of partial differential equations and its numerical solution represents a very intensive computational task due to the required fine spatial and temporal resolution. Recent studies have shown that the […]

CUDA

•

OpenCL

•

OpenGL

Aug, 1

NOVA: A Functional Language for Data Parallelism

Functional languages provide a solid foundation on which complex optimization passes can be designed to exploit available parallelism in the underlying system. Their mathematical foundations enable high-level optimizations that would be impossible in traditional imperative languages. This makes them uniquely suited for generation of efficient target code for parallel systems, such as multiple Central Processing […]

CUDA

Aug, 1

Accelerating BIRCH for Clustering Large Scale Streaming Data Using CUDA Dynamic Parallelism

In this big data era, the capability of mining and analyzing large scale datasets is imperative. As data are becoming more abundant than ever before, data driven methods are playing a critical role in areas such as decision support and business intelligence. In this paper, we demonstrate how state-of-the-art GPUs and the Dynamic Parallelism feature […]

CUDA

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

* * *

high performance computing on graphics processing units: hgpu.org

Posts

An Ultra-Fast, Optimized and Massively-Parallelized Curvelet Transform Algorithm on GP-GPUs

Attack Signature Matching using Graphics Processors in High-Performance Intrusion Detection Systems

Simulating a Family of Tissue P Systems Solving SAT on the GPU

Ray Tracing in the Cloud using MapReduce

AQUAgpusph, a free 3D SPH solver accelerated with OpenCL

Strategies for Optimization of Parallel Programs

Real-Time Electroholography Using a Multi-GPU Environmental PC

DRiVE: An Example of Distributed Rendering in Virtual Environments

Large-Scale Sound Field Rendering in Rectangular Room with Specular Reflection

Comparing CUDA, OpenCL and OpenGL Implementations of the Cardiac Monodomain Equations

NOVA: A Functional Language for Data Parallelism

Accelerating BIRCH for Clustering Large Scale Streaming Data Using CUDA Dynamic Parallelism

Recent source codes

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Most viewed papers (last 30 days)