high performance computing on graphics processing units: hgpu.org

Posts

Feb, 1

Productive and Efficient Computational Science Through Domain-specific Abstractions

In an ideal world, scientific applications are computationally efficient, maintainable and composable and allow scientists to work very productively. We argue that these goals are achievable for a specific application field by choosing suitable domain-specific abstractions that encapsulate domain knowledge with a high degree of expressiveness. This thesis demonstrates the design and composition of domain-specific […]

CUDA

•

OpenCL

Feb, 1

Performance Analysis and Optimization of a Distributed Processing Framework for Data Mining Accelerated with Graphics Processing Units

In this age, a huge amount of data is generated every day by human interactions with services. Discovering the patterns of these data are very important to take business decisions. Due to the size of this data, it requires very high intensive computation power. Thus, many frameworks have been developed using Central Processing Units (CPU) […]

CUDA

Jan, 30

On Vectorization of Deep Convolutional Neural Networks for Vision Tasks

We recently have witnessed many ground-breaking results in machine learning and computer vision, generated by using deep convolutional neural networks (CNN). While the success mainly stems from the large volume of training data and the deep network architectures, the vector processing hardware (e.g. GPU) undisputedly plays a vital role in modern CNN implementations to support […]

CUDA

Jan, 30

OpenCL Implementation of LiDAR Data Processing

When designing a safety system, the faster the response time, the greater the reflexes of the system to hazards. As more commercial interest in autonomous and assisted vehicles grows, the number one concern is safety. If the system cannot react as fast as or faster than an average human, then the public will deem it […]

OpenCL

Jan, 30

Different Optimization Strategies and Performance Evaluation of Reduction on Multicore CUDA Architecture

The objective of this paper is to use different optimization strategies on multicore GPU architecture. Here for performance evaluation we have used parallel reduction algorithm. GPU on-chip shared memory is very fast than local and global memory. Shared memory latency is roughly 100x lower than non-cached global memory (make sure that there are no bank […]

CUDA

Jan, 30

Accelerate micromagnetic simulations with GPU programming in MATLAB

A finite-difference Micromagnetic simulation code written in MATLAB is presented with Graphics Processing Unit (GPU) acceleration. The high performance of Graphics Processing Unit (GPU) is demonstrated compared to a typical Central Processing Unit (CPU) based code. The speed-up of GPU to CPU is shown to be greater than 30 for problems with larger sizes on […]

CUDA

Jan, 30

Design Space Exploration of OpenCL Applications on Heterogeneous Parallel Platforms

Parallel programming is a skill which software engineers no longer can do without, since multi- and many-core architectures have been widely adopted for general-purpose computing platforms. In 2006 Intel introduced the first multi-core processor on the consumer market and, at the same time, NVIDIA unveiled CUDA, a programming paradigm to exploit Graphics Processing Units (GPUs) […]

CUDA

•

OpenCL

Jan, 28

maxDNN: An Efficient Convolution Kernel for Deep Learning with Maxwell GPUs

This paper describes maxDNN, a computationally efficient convolution kernel for deep learning with the NVIDIA Maxwell GPU. maxDNN reaches 96.3% computational efficiency on typical deep learning network architectures using a single kernel. The design combines ideas from cuda-convnet2 with the Maxas SGEMM assembly code. We only address forward propagation (FPROP) operation of the network, but […]

CUDA

Jan, 28

Accelerating Polynomial Homotopy Continuation on a Graphics Processing Unit with Double Double and Quad Double Arithmetic

Numerical continuation methods apply predictor-corrector algorithms to track a solution path defined by a family of systems, the so-called homotopy. The systems we consider are defined by polynomials in several variables with complex coefficients. For larger dimensions and degrees, the numerical conditioning worsens and hardware double precision becomes often insufficient to reach the end of […]

CUDA

Jan, 28

GPU Programming – Speeding Up the 3D Surface Generator VESTA

The novel "Volume-Enclosing Surface exTraction Algorithm" (VESTA) generates triangular isosurfaces from computed tomography volumetric images and/or three-dimensional (3D) simulation data. Here, we present various benchmarks for GPU-based code implementations of both VESTA and the current state-of-the-art Marching Cubes Algorithm (MCA). One major result of this study is that VESTA runs significantly faster than the MCA.

CUDA

Jan, 28

On Longest Repeat Queries Using GPU

Repeat finding in strings has important applications in subfields such as computational biology. The challenge of finding the longest repeats covering particular string positions was recently proposed and solved by Ileri et al., using a total of the optimal O(n) time and space, where n is the string size. However, their solution can only find […]

CUDA

Jan, 28

The GPU vs Phi Debate: Risk Analytics Using Many-Core Computing

The risk of reinsurance portfolios covering globally occurring natural catastrophes, such as earthquakes and hurricanes, is quantified by employing simulations. These simulations are computationally intensive and require large amounts of data to be processed. The use of many-core hardware accelerators, such as the Intel Xeon Phi and the NVIDIA Graphics Processing Unit (GPU), are desirable […]

CUDA

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Productive and Efficient Computational Science Through Domain-specific Abstractions

Performance Analysis and Optimization of a Distributed Processing Framework for Data Mining Accelerated with Graphics Processing Units

On Vectorization of Deep Convolutional Neural Networks for Vision Tasks

OpenCL Implementation of LiDAR Data Processing

Different Optimization Strategies and Performance Evaluation of Reduction on Multicore CUDA Architecture

Accelerate micromagnetic simulations with GPU programming in MATLAB

Design Space Exploration of OpenCL Applications on Heterogeneous Parallel Platforms

maxDNN: An Efficient Convolution Kernel for Deep Learning with Maxwell GPUs

Accelerating Polynomial Homotopy Continuation on a Graphics Processing Unit with Double Double and Quad Double Arithmetic

GPU Programming – Speeding Up the 3D Surface Generator VESTA

On Longest Repeat Queries Using GPU

The GPU vs Phi Debate: Risk Analytics Using Many-Core Computing

Recent source codes

Specx: Speculative task-based runtime system

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

KISim: Kubernetes Intelligent Scheduling Simulator

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

Most viewed papers (last 30 days)