high performance computing on graphics processing units: hgpu.org

Posts

Sep, 25

Performance Evaluation of Edge Detection Techniques on GPU Using OpenCL

GPU (Graphic processing system) enhance the performance of the performance of the computing field due to its hundreds of cores in parallel. CUDA (Compute Unified Device Architecture) and OpenCL (Open Computing Language) programming models are included in GPU. The advantage of these two programming models in GPU is that developers don’t have to understand any […]

OpenCL

Sep, 25

MASCOT: Fast and Highly Scalable SVM Cross-validation using GPUs and SSDs

Cross-validation is a commonly used method for evaluating the effectiveness of Support Vector Machines (SVMs). However, existing SVM cross-validation algorithms are not scalable to large datasets because they have to (i) hold the whole dataset in memory and/or (ii) perform a very large number of kernel value computation. In this paper, we propose a scheme […]

CUDA

Sep, 25

Scalability Analysis of Parallel Algorithms on GPU Clusters

Scalability is an important concept in the domain of parallel computing. Since Graphics Processing Unit (GPU) clusters are and will be widely utilized in high performance computing platforms, we investigate the factors influencing the scalability for combinations of parallel algorithms (PA) and GPU clusters (GC).We present a scalability model for combination PA-GC and then validate […]

CUDA

Sep, 24

Calculation of Force Field Grids for Molecular Docking Using Graphics Processing Unit

The vast majority of problems faced by bioinformatics are very complex and time consuming. They require the use of modern high-performance computational systems and the development of algorithms for such system. Heterogeneous computing systems which include graphics processing unit (GPU) occupy a separate niche. Such systems allow to accelerate solving of some task significantly. The […]

CUDA

Sep, 23

Advanced Optimizations of An Implicit Navier-Stokes Solver on GPGPU

General-purpose computing on graphics processing units (GPGPU) is a massive fine-grain parallel computation platform, which is is particularly attractive for CFD tasks due to its potential of one or two magnitudes of performance improvement with relatively low capital investment. Many successful attempts have been reported in recent years (see, for example [1, 2, 3, 4, […]

CUDA

Sep, 23

Explicit Integration with GPU Acceleration for Large Kinetic Networks

We demonstrate the first implementation of recently-developed fast explicit kinetic integration algorithms on modern graphics processing unit (GPU) accelerators. Taking as a generic test case a Type Ia supernova explosion with an extremely stiff thermonuclear network having 150 isotopic species and 1604 reactions coupled to hydrodynamics using operator splitting, we demonstrate the capability to solve […]

CUDA

Sep, 23

Computational Gravitational Dynamics with Modern Numerical Accelerators

We review the recent optimizations of gravitational N-body kernels for running them on graphics processing units (GPUs), on single hosts and massive parallel platforms. For each of the two main N-body techniques, direct summation and tree-codes, we discuss the optimization strategy, which is different for each algorithm. Because both the accuracy as well as the […]

CUDA

Sep, 23

An optimized GPU implementation of a 2D free surface simulation model on unstructured meshes

This work is related with the implementation of a finite volume method to solve the 2D Shallow Water Equations on Graphic Processing Units (GPU). The strategy is fully oriented to work efficiently with unstructured meshes which are widely used in many fields of Engineering. Due to the design of the GPU cards, structured meshes are […]

CUDA

Sep, 20

High performance histogramming on massively parallel processors

Histogramming is a technique by which input datasets are mined to extract features and patterns. Histograms have wide range of uses in computer vision, machine learning, database processing, quality control for manufacturing, and many applications benefit from advance knowledge about the distribution of data. Computing a histogram is, essentially, the antithesis of parallel processing. Without […]

CUDA

Sep, 20

Parallel Hierarchical Clustering on the GPU

Clustering is a basic task in exploratory data analysis. It is used to partition elements of a set into disjoint groups, so-called clusters, such that elements within a group are similar to each other, but dissimilar to elements of other groups. Several clustering algorithms exist, which can be applied depending on the type of dataset […]

OpenCL

Sep, 20

Interactive Ray Tracing with Data Locality Optimizations

Ray tracing denotes a class of rendering algorithms that are well-known for their flexibility and their capability of generating highly realistic images of three dimensional models. However, due to the heavy computational requirements, it has traditionally been used for offline rendering. Improving the performance of ray tracing has been an active area of research and […]

OpenCL

Sep, 20

Parallel Primitive Optimization for GPU and Multicore

This thesis focuses on the use of automatic code generation to combine different classes of optimizations to find the best optimization for parallel reduction in OpenCL on various devices. It also introduces the optimizations used. In the end the results of the combinations will be evaluated and discussed.

OpenCL

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Performance Evaluation of Edge Detection Techniques on GPU Using OpenCL

MASCOT: Fast and Highly Scalable SVM Cross-validation using GPUs and SSDs

Scalability Analysis of Parallel Algorithms on GPU Clusters

Calculation of Force Field Grids for Molecular Docking Using Graphics Processing Unit

Advanced Optimizations of An Implicit Navier-Stokes Solver on GPGPU

Explicit Integration with GPU Acceleration for Large Kinetic Networks

Computational Gravitational Dynamics with Modern Numerical Accelerators

An optimized GPU implementation of a 2D free surface simulation model on unstructured meshes

High performance histogramming on massively parallel processors

Parallel Hierarchical Clustering on the GPU

Interactive Ray Tracing with Data Locality Optimizations

Parallel Primitive Optimization for GPU and Multicore

Recent source codes

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Most viewed papers (last 30 days)