high performance computing on graphics processing units: hgpu.org

Posts

Jul, 29

Harnessing the GPU for Real-Time Haptic Tissue Simulation

Virtual surgery simulators are emerging as a training method for medical specialists and are expected to provide a virtual environment that is realistic and responsive enough to be able to physically simulate a wide variety of medical scenarios. Haptic interaction with the environment requires an underlying physical model that is dynamic, deformable, and computable in […]

CUDA

Jul, 29

Point Based Approximate Color Bleeding With Cuda

Simulating light is a very computationally expensive proposition. There are a wide variety of global illumination algorithms that are implemented and used by major motion picture companies to render interesting and believable scenes. Every algorithm strives to find a balance between speed and accuracy. The Point Based Approximate Color Bleeding algorithm is one of the […]

CUDA

Jul, 27

Graph-Based Substructure Pattern Mining Using CUDA Dynamic Parallelism

CUDA is an advanced massively parallel computing platform that can provide high performance computing power at much more affordable cost. In this paper, we present a parallel graph-based substructure pattern mining algorithm using CUDA Dynamic Parallelism. The key contribution is a parallel solution to traversing the DFS (Depth First Search) code tree. Furthermore, we implement […]

CUDA

Jul, 27

Systematic Performance Optimization of Cone-Beam Back-Projection on the Kepler Architecture

Filtered back-projection algorithms are widely used for the reconstruction of volumetric data from cone-beam projections in interventional C-arm computed tomography. Furthermore, general-purpose GPUs have become a popular tool for accelerating the reconstruction during time-critical clinical procedures. In this work, we focus on the systematic performance optimization of cone-beam back-projection on the latest architecture of CUDA-enabled […]

CUDA

Jul, 27

Implementation of 2-D Discrete Cosine Transform Algorithm on GPU

Discrete Cosine Transform (DCT) is a technique to get frequency separation. When DCT is applied on an image, it will give frequency segregation of an image since it is composed of DC value and range of low frequency values to high frequency values. DCT is very useful in image compression. When high frequency values are […]

CUDA

Jul, 27

Fast Image Processing with Embedded Microprocessors

This Thesis intends to be a startup guide in understanding the basics of Image Processing techniques and common use cases, but at the same time take advantage of the Graphics Processing Unit available in today’s embedded multimedia microprocessors present in netbooks, smartphones and tablets.

OpenGL

Jul, 27

GPU Parallel Algorithms for Reporting Movement Behaviour Patterns in Spatiotemporal Databases

Mobility is a key element of many processes and activities, and the understanding of movement is important in many areas of science and technology. With the recent advances in technologies for mobile devices, like GPS and mobile phones, we are able to generate data sets of people, animals, vehicles and other moving objects, normally available […]

CUDA

Jul, 25

A unified sparse matrix data format for modern processors with wide SIMD units

Sparse matrix-vector multiplication (spMVM) is the most time-consuming kernel in many numerical algorithms and has been studied extensively on all modern processor and accelerator architectures. However, the optimal sparse matrix data storage format is highly hardware-specific, which could become an obstacle when using heterogeneous systems. Also, it is as yet unclear how the wide single […]

CUDA

Jul, 25

Parallel birth and death process for cell nuclei extraction in histopathology images

Cell nuclei extraction from histopathology images is necessary for breast cancer grading, and has become one of the major problem in the domain of automatic image analysis. Stochastic marked point processes combined with birth and death processes are promising tools for such extraction, but they are extremely compute intensive, especially on large images such as […]

CUDA

Jul, 25

Characterization of Lossy SIW Resonators Based on Multilayer Perceptron Neural Networks on Graphics Processing Unit

In recent years, Artificial Neural networks (ANNs) have been intensively employed to build smart model of microwave devices. In this paper a characterization of lossy SIW resonators by means of Multilayer Perceptron Neural Networks (MLPNNs) on Graphics Processing Unit (GPU), is presented. Once properly selected and trained, a MLPNN can evaluate the lossy SIW resonator’s […]

CUDA

Jul, 25

Octree Light Propagation Volumes

This paper presents a new method for representing Light Propagation Volumes using an octree data structure, and for allowing light from regular point light sources to be injected into them. The resulting technique uses full octrees with the help of a separate data structure for representing the octree structure. The octree structure enables light propagation […]

CUDA

•

OpenGL

Jul, 25

From Physics Model to Results: An Optimizing Framework for Cross-Architecture Code Generation

Starting from a high-level problem description in terms of partial differential equations using abstract tensor notation, the Chemora framework discretizes, optimizes, and generates complete high performance codes for a wide range of compute architectures. Chemora extends the capabilities of Cactus, facilitating the usage of large-scale CPU/GPU systems in an efficient manner for complex applications, without […]

CUDA

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Engineering Supercomputing Platforms for Biomolecular Applications

high performance computing on graphics processing units: hgpu.org

Posts

Harnessing the GPU for Real-Time Haptic Tissue Simulation

Point Based Approximate Color Bleeding With Cuda

Graph-Based Substructure Pattern Mining Using CUDA Dynamic Parallelism

Systematic Performance Optimization of Cone-Beam Back-Projection on the Kepler Architecture

Implementation of 2-D Discrete Cosine Transform Algorithm on GPU

Fast Image Processing with Embedded Microprocessors

GPU Parallel Algorithms for Reporting Movement Behaviour Patterns in Spatiotemporal Databases

A unified sparse matrix data format for modern processors with wide SIMD units

Parallel birth and death process for cell nuclei extraction in histopathology images

Characterization of Lossy SIW Resonators Based on Multilayer Perceptron Neural Networks on Graphics Processing Unit

Octree Light Propagation Volumes

From Physics Model to Results: An Optimizing Framework for Cross-Architecture Code Generation

Recent source codes

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)