high performance computing on graphics processing units: hgpu.org

Posts

Aug, 28

The Arcane development framework

In this paper, we introduce the Arcane software development framework for 2D and 3D numerical simulation codes. First, we describe the Arcane core, the mesh management and the parallelism strategy. Then, we focus on the concepts introduced to speed up the development of numerical codes: numerical modules, variables, entry points and services. We explain the […]

Aug, 28

Exposing non-standard architectures to embedded software using compile-time virtualisation

The architectures of embedded systems are often application-specific, containing multiple heterogenous cores, non-uniform memory, on-chip networks and custom hardware elements (e.g. DSP cores). Standard programming languages do not use these many of these features natively because they assume a traditional single processor and a single logical address space abstraction that hides these architectural details. This […]

Aug, 28

The impact of diverse memory architectures on multicore consumer software: an industrial perspective from the video games domain

Memory architectures need to adapt in order for performance and scalability to be achieved in software for multicore systems. In this paper, we discuss the impact of techniques for scalable memory architectures, especially the use of multiple, non-cache-coherent memory spaces, on the implementation and performance of consumer software. Primarily, we report extensive real-world experience in […]

Aug, 28

A new multi-core pipelined architecture for executing sequential programs for parallel geospatial computing

Parallel programming on multi-core processors has become the industry’s biggest software challenge. This paper proposes a novel parallel architecture for executing sequential programs using multi-core pipelining based on program slicing by a new memory/cache dynamic management technology. The new architecture is very suitable for processing large geospatial data in parallel without parallel programming. This paper […]

Aug, 28

Automating GPU computing in MATLAB

MATLAB is a popular software platform for scientific and engineering software writers. It offers a high level of abstraction for fundamental mathematical operations and extensive highly optimized domain-specific libraries for several scientific and engineering disciplines. With the recent availability of GPU libraries for MATLAB, it has become possible to easily exploit GPGPUs as coprocessors. However, […]

Aug, 28

CUDA accelerated iris template matching on Graphics Processing Units (GPUs)

In this paper we develop a parallelized iris template matching implementation on inexpensive Graphics Processing Units (GPUs) with Nvidia’s CUDA programming model to achieve matching rates of 44 million iris template comparisons per second without rotation invariance. With tolerance to head tilt, we achieve 4.2 million matches per second and compare our implementation to state […]

CUDA

Aug, 28

Real-time task reconfiguration support applied to an UAV-based surveillance system

Modern surveillance systems, such as those based on the use of unmanned aerial vehicles, required powerful high-performance platforms to deal with many different algorithms that make use of massive calculations. At the same time, low-cost and high-performance specific hardware (e.g., GPU, PPU) are rising and the CPUs turned to multiple cores, characterizing together an interesting […]

CUDA

Aug, 28

Collaborative execution environment for heterogeneous parallel systems

Nowadays, commodity computers are complex heterogeneous systems that provide a huge amount of computational power. However, to take advantage of this power we have to orchestrate the use of processing units with different characteristics. Such distributed memory systems make use of relatively slow interconnection networks, such as system buses. Therefore, most of the time we […]

CUDA

Aug, 28

Real-time photo style transfer

This paper presents a novel approach for real-time photo style transfer. The automatic image manipulation technique is performed in the oRGB color space, which is a new color model based on the psychologically opponent color theory. We transfer color from an appropriate source image to the target image using a simple statistical analysis. In addition, […]

CUDA

Aug, 28

Global Illumination for Advanced Computer Graphics

Real-time 3D graphics is present today on various devices, from high-end PC powered by highly complex GPUs to more simple handheld consoles or mobile phones. All these solutions are based on an aging technique called immediate mode rasterization, very efficient for rendering simple scenes, but unable to capture essential visual features such as soft shadows, […]

Aug, 27

Switching to High Gear: Opportunities for Grand-Scale Real-Time Parallel Simulations

The recent emergence of dramatically large computational power, spanning desktops with multi-core processors and multiple graphics cards to supercomputers with 105 processor cores, has suddenly resulted in simulation-based solutions trailing behind in the ability to fully tap the new computational capacity. Here, we motivate the need for switching the parallel simulation research to a higher […]

CUDA

Aug, 27

directCell: hybrid systems with tightly coupled accelerators

The Cell Broadband Engine (Cell/B.E.) processor is a hybrid IBM PowerPC processor. In blade servers and PCI Express card systems, it has been used primarily in a server context, with Linux as the operating system. Because neither Linux as an operating system nor a PowerPC processor-based architecture is the preferred choice for all applications, some […]

* * *

high performance computing on graphics processing units: hgpu.org

Posts

The Arcane development framework

Exposing non-standard architectures to embedded software using compile-time virtualisation

The impact of diverse memory architectures on multicore consumer software: an industrial perspective from the video games domain

A new multi-core pipelined architecture for executing sequential programs for parallel geospatial computing

Automating GPU computing in MATLAB

CUDA accelerated iris template matching on Graphics Processing Units (GPUs)

Real-time task reconfiguration support applied to an UAV-based surveillance system

Collaborative execution environment for heterogeneous parallel systems

Real-time photo style transfer

Global Illumination for Advanced Computer Graphics

Switching to High Gear: Opportunities for Grand-Scale Real-Time Parallel Simulations

directCell: hybrid systems with tightly coupled accelerators

Recent source codes

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

PELSI: Power-Efficient Layer-Switched Inference

Ouroboros: Virtualized Queues for dynamic memory management

MSCCL++: A GPU-driven communication stack for scalable AI applications

Benchmark compute shader of Unity against InteropUnityCUDA

Most viewed papers (last 30 days)