high performance computing on graphics processing units: hgpu.org

Posts

Aug, 31

Partial wave analysis at BES III harnessing the power of GPUs

Partial wave analysis is a core tool in hadron spectroscopy. With the high statistics data available at facilities such as the Beijing Spectrometer III, this procedure becomes computationally very expensive. We have successfully implemented a framework for performing partial wave analysis on graphics processors. We discuss the implementation, the parallel computing frameworks employed and the […]

OpenCL

Aug, 31

Partial Wave Analysis using Graphics Cards

Partial wave analysis is a key technique in hadron spectroscopy. The use of unbinned likelihood fits on large statistics data samples and ever more complex physics models makes this analysis technique computationally very expensive. Parallel computing techniques, in particular the use of graphics processing units, are a powerful means to speed up analyses; in the […]

OpenCL

Aug, 31

Volume exploration using ellipsoidal Gaussian transfer functions

This paper presents an interactive transfer function design tool based on ellipsoidal Gaussian transfer functions (ETFs). Our approach explores volumetric features in the statistical space by modeling the space using the Gaussian mixture model (GMM) with a small number of Gaussians to maximize the likelihood of feature separation. Instant visual feedback is possible by mapping […]

Aug, 31

FPGA based Speeded Up Robust Features

We present an implementation of the Speeded Up Robust Features (SURF) on a Field Programmable Gate Array (FPGA). The SURF algorithm extracts salient points from image and computes descriptors of their surroundings that are invariant to scale, rotation and illumination changes. The interest point detection and feature descriptor extraction algorithm is often used as the […]

Aug, 30

Invited paper: Accelerating neuromorphic vision on FPGAs

Reconfigurable hardware such as FPGAs are being increasingly employed for application acceleration due to their high degree of parallelism, flexibility and power efficiency – factors which are key in the rapidly evolving field of embedded real-time vision. While recent advances in technology have increased the capacity of FPGAs, lack of standard models for developing custom […]

Aug, 30

An FPGA-specific algorithm for direct generation of multi-variate Gaussian random numbers

The multi-variate Gaussian distribution is used to model random processes with distinct pair-wise correlations, such as stock prices that tend to rise and fall together. Multi-variate Gaussian vectors with length n are usually produced by first generating a vector of n independent Gaussian samples, then multiplying with a correlation inducing matrix requiring 0(n2) multiplications. This […]

CUDA

Aug, 30

Real-Time Plane-Sweeping Stereo with Multiple Sweeping Directions

Recent research has focused on systems for obtaining automatic 3D reconstructions of urban environments from video acquired at street level. These systems record enormous amounts of video; therefore a key component is a stereo matcher which can process this data at speeds comparable to the recording frame rate. Furthermore, urban environments are unique in that […]

Aug, 30

Interactive rendering of large unstructured grids using dynamic level-of-detail

We describe a new dynamic level-of-detail (LOD) technique that allows real-time rendering of large tetrahedral meshes. Unlike approaches that require hierarchies of tetrahedra, our approach uses a subset of the faces that compose the mesh. No connectivity is used for these faces so our technique eliminates the need for topological information and hierarchical data structures. […]

OpenGL

Aug, 28

The Arcane development framework

In this paper, we introduce the Arcane software development framework for 2D and 3D numerical simulation codes. First, we describe the Arcane core, the mesh management and the parallelism strategy. Then, we focus on the concepts introduced to speed up the development of numerical codes: numerical modules, variables, entry points and services. We explain the […]

Aug, 28

Exposing non-standard architectures to embedded software using compile-time virtualisation

The architectures of embedded systems are often application-specific, containing multiple heterogenous cores, non-uniform memory, on-chip networks and custom hardware elements (e.g. DSP cores). Standard programming languages do not use these many of these features natively because they assume a traditional single processor and a single logical address space abstraction that hides these architectural details. This […]

Aug, 28

The impact of diverse memory architectures on multicore consumer software: an industrial perspective from the video games domain

Memory architectures need to adapt in order for performance and scalability to be achieved in software for multicore systems. In this paper, we discuss the impact of techniques for scalable memory architectures, especially the use of multiple, non-cache-coherent memory spaces, on the implementation and performance of consumer software. Primarily, we report extensive real-world experience in […]

Aug, 28

A new multi-core pipelined architecture for executing sequential programs for parallel geospatial computing

Parallel programming on multi-core processors has become the industry’s biggest software challenge. This paper proposes a novel parallel architecture for executing sequential programs using multi-core pipelining based on program slicing by a new memory/cache dynamic management technology. The new architecture is very suitable for processing large geospatial data in parallel without parallel programming. This paper […]

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Partial wave analysis at BES III harnessing the power of GPUs

Partial Wave Analysis using Graphics Cards

Volume exploration using ellipsoidal Gaussian transfer functions

FPGA based Speeded Up Robust Features

Invited paper: Accelerating neuromorphic vision on FPGAs

An FPGA-specific algorithm for direct generation of multi-variate Gaussian random numbers

Real-Time Plane-Sweeping Stereo with Multiple Sweeping Directions

Interactive rendering of large unstructured grids using dynamic level-of-detail

The Arcane development framework

Exposing non-standard architectures to embedded software using compile-time virtualisation

The impact of diverse memory architectures on multicore consumer software: an industrial perspective from the video games domain

A new multi-core pipelined architecture for executing sequential programs for parallel geospatial computing

Recent source codes

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Most viewed papers (last 30 days)