high performance computing on graphics processing units: hgpu.org

Posts

Jul, 24

Parallel Implementation of Texture Based Image Retrieval on The GPU

Most image processing algorithms are inherently parallel, so multithreading processors are suitable in such applications. In huge image databases, image processing takes very long time for run on a single core processor because of single thread execution of algorithms. Graphical Processors Units (GPU) is more common in most image processing applications due to multithread execution […]

CUDA

Jul, 24

Implementation of Filtering Beamforming Algorithms for Sonar Devices Using GPU

Beamforming is a signal processing technique used in sensor arrays to direct signal transmission or reception. Beamformer combines input signals in the array to achieve constructive interference at particular angles (beams) and destructive interference for other angles. According to the following facts: 1) Beamforming can be computationally intensive, so real-time sonar beamforming algorithms in sonar […]

CUDA

Jul, 24

CDFC: Collision Detection Based on Fuzzy Clustering for Deformable Objects on GPU’s

We present a novel Collision Detection Based on Fuzzy Clustering for Deformable Objects on GPU’s (CDFC) technique to perform collision queries between rigid and/or deformable models. Our method can handle arbitrary deformations and even discontinuous ones. With our approach, we subdivide the scene into connected but totally independent parts by fuzzy clustering, and therefore, the […]

CUDA

Jul, 22

Multi-core CUDA Architecture for Parallelization of Hierarchical Text Clustering

Text Clustering is the problem of dividing text documents into groups, such that documents in same group are similar to one another and different from documents in other groups. Because of the general tendency of texts forming hierarchies, text clustering is best performed by using a hierarchical clustering method. An important aspect while clustering large […]

CUDA

Jul, 22

OpenCL simulations of two-fluid compressible flows with a random choice method

In this paper, we propose a new very simple numerical method for solving liquid-gas compressible flows. Such flows are difficult to simulate because classic conservative finite volume schemes generate pressure oscillations at the liquid-gas interface. We extend to several dimensions the random choice scheme that we have constructed in [2]. The extension is performed through […]

OpenCL

Jul, 22

Performance Evaluation of the Ocean-Land-Atmosphere Model Using Graphics Processing Units

The Ocean-Land-Atmosphere Model (OLAM) is an atmospheric model to simulate and cover all Earth surface. OLAM demands a great amount of processing in a simulation because of the large number of data structures used to represent the atmosphere. Because of this, we investigate in this paper how to increase performance using GPUs to compute the […]

CUDA

Jul, 22

An overview of techniques for predicting the performance of GPU accelerated applications

The ability to predict the performance of applications in large-scale parallel systems is essential. One of the main incentives for this is the high cost of executing non-production tasks on these systems. An entity may also want to predict the performance in a system that does not yet exist. One popular alternative for increasing a […]

CUDA

Jul, 22

Automatic Generation of FFT Libraries for GPU Platforms

Compilers introduce a set of optimizations to speed-up source code. However due to the variety of computation platforms, algorithm complexity and problem sizes, general purpose compilers can fail to improve performance. The burden on library developers increases significantly to write optimized libraries since the user code relies on them for performance. This argument strengthens the […]

CUDA

Jul, 21

Experimental Evaluation of Thread Distribution Effects on Multiple Output Errors in GPUs

Graphic Processing Units are very prone to be corrupted by neutrons. Experimental results show that in the majority of the cases a typical application like matrix multiplication is affected by multiple output errors. In this paper we evaluate how different thread distributions impact the multiple output errors occurrence. The reported results and the performed architecture […]

CUDA

Jul, 21

Detecting parametric objects in large scenes by Monte Carlo sampling

Point processes constitute a natural extension of Markov Random Fields (MRF), designed to handle parametric objects. They have shown efficiency and competitiveness for tackling object extraction problems in vision. Simulating these stochastic models is however a difficult task. The performances of the existing samplers are limited in terms of computation time and convergence stability, especially […]

CUDA

Jul, 21

The Astrophysical Multipurpose Software Environment

We present the open source Astrophysical Multi-purpose Software Environment (AMUSE, www.amusecode.org), a component library for performing astrophysical simulations involving different physical domains and scales. It couples existing codes within a Python framework based on a communication layer using MPI. The interfaces are standardized for each domain and their implementation based on MPI guarantees that the […]

CUDA

Jul, 21

Parallel and Concurrent Programming in Haskell: Techniques for Multicore and Multithreaded Programming

This book covers the breadth of Haskell’s diverse selection of programming APIs for concurrent and parallel programming. It is split into two parts. The first part, on parallel programming, covers the techniques for using multiple processors to speed up CPU-intensive computations, including methods for using parallelism in both idiomatic Haskell and numerical array-based algorithms, and […]

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Parallel Implementation of Texture Based Image Retrieval on The GPU

Implementation of Filtering Beamforming Algorithms for Sonar Devices Using GPU

CDFC: Collision Detection Based on Fuzzy Clustering for Deformable Objects on GPU’s

Multi-core CUDA Architecture for Parallelization of Hierarchical Text Clustering

OpenCL simulations of two-fluid compressible flows with a random choice method

Performance Evaluation of the Ocean-Land-Atmosphere Model Using Graphics Processing Units

An overview of techniques for predicting the performance of GPU accelerated applications

Automatic Generation of FFT Libraries for GPU Platforms

Experimental Evaluation of Thread Distribution Effects on Multiple Output Errors in GPUs

Detecting parametric objects in large scenes by Monte Carlo sampling

The Astrophysical Multipurpose Software Environment

Parallel and Concurrent Programming in Haskell: Techniques for Multicore and Multithreaded Programming

Recent source codes

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Most viewed papers (last 30 days)