high performance computing on graphics processing units: hgpu.org

Posts

Feb, 13

Using Graphical Processing Units in Scheduling Problems

Scheduling problems exist everywhere in the so-called "real world". They are there in manufacturing, transportation and logistics as well. The main object of these problems is to find an optimal sequence of tasks to be able to fulfil predefined objectives. There are efficient methods to solve complex scheduling problems in science and industry, which methods […]

CUDA

Feb, 13

Work Stealing Inside GPUs

Graphics Processing units have become a valuable support for High Performance Computing (HPC) applications. However, despite the many improvements on the General Purpose GPU, there is still the need of a generic programming model adaptable to the many forms of parallelism that an application can express. The CUDA programming model is widely used on the […]

CUDA

Feb, 13

LAMMPScuda – a new GPU accelerated Molecular Dynamics Simulations Package and its Application to Ion-Conducting Glasses

Today, computer simulations form an integral part of many research and development efforts. The scope of what can be modeled has increased dramatically, as computing performance improved over the last two decades. But with serial-execution performance of CPUs leveling off, future performance increases for computational physics, material design, and biology must come from higher parallelization. […]

CUDA

Feb, 13

Analytic Anti-Aliasing of Linear Functions on Polytopes

This paper presents an analytic formulation for anti-aliased sampling of 2D polygons and 3D polyhedra. Our framework allows the exact evaluation of the convolution integral with a linear function defined on the polytopes. The filter is a spherically symmetric polynomial of any order, supporting approximations to refined variants such as the Mitchell-Netravali filter family. This […]

CUDA

Feb, 12

Recursive MIS Computation for Streaming BDPT on the GPU

Bidirectional Path Tracing (BDPT) is a robust unbiased rendering algorithm that samples paths by connecting eye and light paths. By optimally combining different sampling strategies using Multiple Importance Sampling (MIS), BDPT efficiently renders scenes with complex light effects. However, BDPT does not map well on a streaming architecture such as the GPU; Stochastic path lengths […]

CUDA

Feb, 12

Level Sets and Voronoi based Feature Extraction from any Imagery

Polygon features are of interest in many GEOProcessing applications like shoreline mapping, boundary delineation, change detection, etc. This paper presents a unique new GPU-based methodology to automate feature extraction combining level sets, or mean shift based segmentation together with Voronoi skeletonization, that guarantees the extracted features to be topologically correct. The features thus extracted as […]

CUDA

Feb, 12

FPGA accelerated 3D reconstruction using compressive sensing

The radiation dose associated with computerized tomography (CT) is significant. Optimization-based iterative reconstruction approaches, e.g., compressive sensing provide ways to reduce the radiation exposure, without sacrificing image quality. However, the computational requirement such algorithms is much higher than that of the conventional Filtered Back Projection (FBP) reconstruction algorithm. This paper describes an FPGA implementation of […]

CUDA

Feb, 12

Fast Polynomial Approximation Acceleration on the GPU

This article presents the possibility of parallelization of calculating polynomial approximations with large data inputs on GPU using NVIDIA CUDA architecture. Parallel implementation on the GPU is compared to the single thread CPU implementation. Despite the enormous computing power of today’s graphics cards there is still a problem with the speed of data transfer to […]

CUDA

Feb, 12

Face Detection CUDA Accelerating

Face detection is very useful and important for many different disciplines. Even for our future work, where the face detection will be used, we wanted to determine, whether it is advantageous to use the technology CUDA for detection faces. First, we implemented the Viola and Jones algorithm in the basic one-thread CPU version. Then the […]

CUDA

Feb, 11

SPIRE, a Sequential to Parallel Intermediate Representation Extension

SPIRE is a new, generic, parallel extension for the intermediate representations used in compilation frameworks of sequential languages; it intends to easily leverage their existing infrastructure to address both control and data parallel languages. Since the efficiency and power of the transformations and optimizations performed by compilers are closely related to the presence of a […]

OpenCL

Feb, 11

Task Parallelism and Synchronization: An Overview of Explicit Parallel Programming Languages

Programming parallel machines as effectively as sequential ones would ideally require a language that provides high-level programming constructs in order to avoid the programming errors frequent when expressing parallelism. Since task parallelism is often considered more error-prone than data parallelism, we survey six popular and efficient parallel programming languages that tackle this difficult issue: Cilk, […]

OpenCL

Feb, 11

High-throughput protein crystallization on the World Community Grid and the GPU

We have developed CPU and GPU versions of an automated image analysis and classification system for protein crystallization trial images from the Hauptman Woodward Institute’s High-Throughput Screening lab. The analysis step computes 12,375 numerical features per image. Using these features, we have trained a classifier that distinguishes 11 different crystallization outcomes, recognizing 80% of all […]

OpenCL

high performance computing on graphics processing units: hgpu.org

Posts

Using Graphical Processing Units in Scheduling Problems

Work Stealing Inside GPUs

LAMMPScuda – a new GPU accelerated Molecular Dynamics Simulations Package and its Application to Ion-Conducting Glasses

Analytic Anti-Aliasing of Linear Functions on Polytopes

Recursive MIS Computation for Streaming BDPT on the GPU

Level Sets and Voronoi based Feature Extraction from any Imagery

FPGA accelerated 3D reconstruction using compressive sensing

Fast Polynomial Approximation Acceleration on the GPU

Face Detection CUDA Accelerating

SPIRE, a Sequential to Parallel Intermediate Representation Extension

Task Parallelism and Synchronization: An Overview of Explicit Parallel Programming Languages

High-throughput protein crystallization on the World Community Grid and the GPU

Recent source codes

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)