high performance computing on graphics processing units: hgpu.org

Posts

Dec, 29

Fast Retinal Vessel Analysis

Due to the increasing availability of so called "Non-Mydriatic" cameras, digital imaging has become a very important part of the ophthalmologist’s work. This has created large databases of retinal images. It would be desirable to have a fast image processing tool that allows to analyse such databases in a short time, and to process the […]

CUDA

Dec, 29

Hybrid Ray Tracing and Path Tracing of Bezier Surfaces Using A Mixed Hierarchy

We present a scheme for interactive ray tracing of Bezier bicubic patches using Newton iteration in this paper. We use a mixed hierarchy representation as the acceleration structure. This has a bounding volume hierarchy above the patches and a fixed depth subpatch tree below it. This helps reduce the number of ray-patch intersections that needs […]

CUDA

Dec, 29

GPU-Based Tracking Algorithms for the ATLAS High-Level Trigger

Results on the performance and viability of data-parallel algorithms on Graphics Processing Units (GPUs) in the ATLAS Level 2 trigger system are presented. We describe the existing trigger data preparation and track reconstruction algorithms, motivation for their optimization, GPU-parallelized versions of these algorithms, and a "client-server" solution for hybrid CPU/GPU event processing used for integration […]

CUDA

Dec, 29

Radionuclides migration modelling using artificial neural networks and parallel computing

In the paper the result of application of artificial neural networks (ANN) for radionuclides transport modelling with surface runoff is presented. ANN with supervised training based on back propagation algorithm was used to predict radionuclides transport in the soil and on its surface. Application of ANN for substances migration modelling is worth using, because it […]

CUDA

Dec, 29

GPU-accelerated MRF segmentation algorithm for SAR images

Markov Random Field (MRF) approaches have been widely studied for Synthetic Aperture Radar (SAR) image segmentation, but they have a large computational cost and hence are not widely used in practice. Fortunately parallel algorithms have been documented to enjoy significant speedups when ported to run on a graphics processing units (GPUs) instead of a standard […]

CUDA

Dec, 25

Task-based Conjugate-Gradient for multi-GPUs platforms

Whereas most today parallel High Performance Computing (HPC) software is written as highly tuned code taking care of low-level details, the advent of the manycore area forces the community to consider modular programming paradigms and delegate part of the work to a third party software. That latter approach has been shown to be very productive […]

CUDA

Dec, 25

Skeleton-based edge bundling for graph visualization

In this paper, we present a novel approach for constructing bundled layouts of general graphs. As layout cues for bundles, we use medial axes, or skeletons, of edges which are similar in terms of position information. We combine edge clustering, distance fields, and 2D skeletonization to construct progressively bundled layouts for general graphs by iteratively […]

OpenGL

Dec, 25

Regularity versus Load-Balancing on GPU for treefix computations

The use of GPUs has enabled us to achieve substantial acceleration in highly regular data parallel applications. The trend is now to look at irregular applications, as it requires advanced load balancing technics. However, it is well known that the use of regular computation is preferable and more suitable when working with these architectures. An […]

OpenCL

Dec, 25

Algorithmic Skeleton Framework for the Orchestration of GPU Computations

The Graphics Processing Unit (GPU) is gaining popularity as a co-processor to the Central Processing Unit (CPU), due to its ability to surpass the latter’s performance in certain application fields. Nonetheless, harnessing the GPU’s capabilities is a non-trivial exercise that requires good knowledge of parallel programming. Thus, providing ways to extract such computational power has […]

OpenCL

Dec, 25

Bioinformatics Sequence Comparisons on Manycore Processors

Searching similarities between sequences is a fundamental operation in bioinformatics, providing insight in biological functions as well as tools for high-throughput data. There is a need to have algorithms able to process efficiently billions of sequences. To look for approximate similarities, a common heuristic is to consider short words that appear exactly in both sequences, […]

OpenCL

Dec, 23

Password Cracking in the Cloud

Cloud computing is a great resource for applications that require computing capacity for a short time but do not need investing in fixed capital for long term. As a result, it can be used for lot of attacks such as cracking passwords, keys or other forms of brute force attacks that are computationally expensive but […]

CUDA

Dec, 23

Employing GPU Accelerators for Efficient Enforcement of Data Integrity in Outsourced Data

Cloud computing provides on-demand webbased software, middleware, and computing resources. It is a service-oriented model and one of its service is Data as a Service (DaaS), also known as Outsourced Database (ODB) model. Although DaaS solves the problem of storing terabytes of data, the security of the data is a major concern for all the […]

CUDA

high performance computing on graphics processing units: hgpu.org

Posts

Fast Retinal Vessel Analysis

Hybrid Ray Tracing and Path Tracing of Bezier Surfaces Using A Mixed Hierarchy

GPU-Based Tracking Algorithms for the ATLAS High-Level Trigger

Radionuclides migration modelling using artificial neural networks and parallel computing

GPU-accelerated MRF segmentation algorithm for SAR images

Task-based Conjugate-Gradient for multi-GPUs platforms

Skeleton-based edge bundling for graph visualization

Regularity versus Load-Balancing on GPU for treefix computations

Algorithmic Skeleton Framework for the Orchestration of GPU Computations

Bioinformatics Sequence Comparisons on Manycore Processors

Password Cracking in the Cloud

Employing GPU Accelerators for Efficient Enforcement of Data Integrity in Outsourced Data

Recent source codes

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)