high performance computing on graphics processing units: hgpu.org

Posts

Mar, 15

Prius: A Runtime for Hybrid Computing

Prius is a framework for seamless execution of OpenCL programs across integrated, heterogeneous systems. Applications interfacing with Prius need not be aware of the characteristics of the hardware; instead the framework will automatically map kernel executions to suitable processors at run-time. The modular nature of the framework allows easy evaluation of new mapping strategies.

OpenCL

Mar, 15

Input-Aware Auto-Tuning for Directive-based GPU Programming

The difficulties posed by GPGPU programming and the need to increase productivity have guided research towards directive-based high-level programs for accelerators. This effort has led to the definition of the OpenACC industry standard. It significantly simplifies writing code for graphics engines leaving the programmer the opportunity to tune the application for the target hardware and […]

Mar, 14

Simulation of a flowing snow avalanche using molecular dynamics

This paper presents an approach for modelling and simulation of a flowing snow avalanche, which is formed of dry and liquefied snow that slides down a slope, by using molecular dynamics and discrete element method. A particle system is utilized as a base method for the simulation and marching cubes with real-time shaders are employed […]

CUDA

Mar, 14

Selection of Task Implementations in the Nanos++ Runtime

New heterogeneous systems and hardware accelerators can give higher levels of computational power to high performance computers. However, this does not come for free, since the more heterogeneity the system presents, the more complex becomes the programming task in terms of resource utilization. OmpSs is a task-based programming model and framework focused on the automatic […]

CUDA

Mar, 14

Automated and interactive approaches for optimal surface finding based segmentation of medical image data

Optimal surface finding (OSF), a graph-based optimization approach to image segmentation, represents a powerful framework for medical image segmentation and analysis. In many applications, a pre-segmentation is required to enable OSF graph construction. Also, the cost function design is critical for the success of OSF. In this thesis, two issues in the context of OSF […]

CUDA

Mar, 14

Parallel Particle Swarm Optimization for Image Segmentation

One of the problems faced with Particle Swarm Optimization (PSO) is that this method is simply time consuming. It is so, especially when it deals with a problem that needs a lot of particles to represent. This paper tries to compare the speed of PSO run at parallel mode to ordinary one. The testing applies […]

CUDA

Mar, 14

CPU and/or GPU: Revisiting the GPU Vs. CPU Myth

Parallel computing using accelerators has gained widespread research attention in the past few years. In particular, using GPUs for general purpose computing has brought forth several success stories with respect to time taken, cost, power, and other metrics. However, accelerator based computing has signifi- cantly relegated the role of CPUs in computation. As CPUs evolve […]

Mar, 12

GPU implementation of a deep learning network for image recognition tasks

Image recognition and classification is one of the primary challenges of the machine learning community. Recent advances in learning systems, coupled with hardware developments have enabled general object recognition systems to be learned on home computers with graphics processing units. Presented is a Deep Belief Network engineered using NVIDIA’s CUDA programming language for general object […]

CUDA

Mar, 12

Parallelization Research of Circle Detection Based on Hough Transform

There is a problem of too long computation time in Circle detection of Hough transform. In this paper, two paralleled methods are given based on Threading Building Blocks (TBB) and CUDA, by utilizing multi-core and GPU, the most timeconsuming part of circle detection is coped with parallelization. Experimental results show that the circle detection algorithms […]

CUDA

Mar, 12

Just-in-time Acceleration of JavaScript

JavaScript has seen tremendous growth in popularity driven by increasingly interactive web sites and sophisticated web interfaces. However, the performance of JavaScript continues to be a hurdle in using it for tasks that are computationally intensive, such as gaming, simulations, and visualization. JavaScript has also been slow to exploit the available parallelism on modern computers. […]

Mar, 12

Comprehensive Analysis of High-Performance Computing Methods for Filtered Back-Projection

This paper provides an extensive runtime, accuracy, and noise analysis of Computed Tomography (CT) reconstruction algorithms using various High-Performance Computing (HPC) frameworks such as: "conventional" multi-core, multi threaded CPUs, Compute Unified Device Architecture (CUDA), and DirectX or OpenGL graphics pipeline programming. The proposed algorithms exploit various built-in hardwired features of GPUs such as rasterization and […]

CUDA

Mar, 12

Parallel spatial data structures for interactive rendering

The main question explored in this thesis is how to define novel parallel random-access data structures for surface and image spatial data with efficient construction, storage, and query memory access patterns. Our main contribution is a set of parallel-efficient methods to evaluate irregular, sparse or even implicit geometries and textures in different applications: a method […]

CUDA

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Prius: A Runtime for Hybrid Computing

Input-Aware Auto-Tuning for Directive-based GPU Programming

Simulation of a flowing snow avalanche using molecular dynamics

Selection of Task Implementations in the Nanos++ Runtime

Automated and interactive approaches for optimal surface finding based segmentation of medical image data

Parallel Particle Swarm Optimization for Image Segmentation

CPU and/or GPU: Revisiting the GPU Vs. CPU Myth

GPU implementation of a deep learning network for image recognition tasks

Parallelization Research of Circle Detection Based on Hough Transform

Just-in-time Acceleration of JavaScript

Comprehensive Analysis of High-Performance Computing Methods for Filtered Back-Projection

Parallel spatial data structures for interactive rendering

Recent source codes

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

PELSI: Power-Efficient Layer-Switched Inference

Ouroboros: Virtualized Queues for dynamic memory management

MSCCL++: A GPU-driven communication stack for scalable AI applications

Benchmark compute shader of Unity against InteropUnityCUDA

Most viewed papers (last 30 days)