high performance computing on graphics processing units: hgpu.org

Posts

Apr, 20

A Convolutional Neural Network Cascade for Face Detection

In real-world face detection, large visual variations, such as those due to pose, expression, and lighting, demand an advanced discriminative model to accurately differentiate faces from the backgrounds. Consequently, effective models for the problem tend to be computationally prohibitive. To address these two conflicting challenges, we propose a cascade architecture built on convolutional neural networks […]

CUDA

Apr, 20

Verification of Producer-Consumer Synchronization in GPU Programs

Previous efforts to formally verify code written for GPUs have focused solely on kernels written within the traditional data-parallel GPU programming model. No previous work has considered the higher performance, but more complex, warp-specialized kernels based on producer-consumer named barriers available on current hardware. In this work we present the first formal operational semantics for […]

CUDA

Apr, 20

Fluid Simulation and Generating Textures with Reaction-Diffusion Systems on Surfaces in the GPU

In recent years, many researchers have used the Navier-Stokes equations and Reaction-Diffusion systems for fluid simulation and for the creation of textures on surfaces, respectively. For this purpose it is necessary to obtain information about operators defined on surfaces. We obtained the metric information of the distortion caused by the parametrization of Catmull-Clark subdivision surfaces. […]

CUDA

Apr, 17

Optimizing ASP.NET with C++ AMP on the GPU

This whitepaper is intended for Microsoft Windows developers who are considering writing high-performance parallel code in Amazon Web Services (AWS) using the Microsoft C++ Accelerated Massive Parallelism (C++ AMP) library. This paper describes an ASP.NET Model-View-Controller (MVC) web application written in C# that invokes C++ functions running on the graphics processing unit (GPU) for matrix […]

Apr, 17

Unsafe Floating-point to Unsigned Integer Casting Check for GPU Programs

Numerical programs usually include type-casting instructions which convert data among different types. Identifying unsafe type-casting is important for preventing undefined program behaviors which cause serious problems such as security vulnerabilities and result non-reproducibility. While many tools had been proposed for handling sequential programs, to our best knowledge, there isn’t a tool geared toward GPUs. In […]

CUDA

Apr, 17

Arbitrary-Precision Arithmetics on the GPU

The majority of computer applications employ numerical data types with a fixed amount of precision for their computations. Their limited numerical range and precision are sufficient for most use cases. However, for some purposes, such as cryptography or geometrical computations, the required range and precision can become arbitrarily large. Numerical types that can handle such […]

CUDA

Apr, 17

Deep convolutional networks for pancreas segmentation in CT imaging

Automatic organ segmentation is an important prerequisite for many computer-aided diagnosis systems. The high anatomical variability of organs in the abdomen, such as the pancreas, prevents many segmentation methods from achieving high accuracies when compared to other segmentation of organs like the liver, heart or kidneys. Recently, the availability of large annotated training sets and […]

CUDA

Apr, 17

NBODY6++GPU: Ready for the gravitational million-body problem

Accurate direct N-body simulations help to obtain detailed information about the dynamical evolution of star clusters. They also enable comparisons with analytical models and Fokker-Planck or Monte-Carlo methods. NBODY6 is a well-known direct N-body code for star clusters, and NBODY6++ is the extended version designed for large particle number simulations by supercomputers. We present NBODY6++GPU, […]

CUDA

Apr, 17

2nd International Conference on Communication and Signal Processing (ICCSP), 2015

Topics: Antennas, RF and Microwave Communications Audio / Speech Processing and Coding Array Signal Processing Bio Signal Processing Cognitive Radio and Cognitive Networks Digital Signal Processing Mobile and Cellular Communications MIMO and Space Time Communications Optical Communication OFDM and CDMA Communication Receivers Satellite Communication Statistical Signal Processing Signal Processing for Communications Signal Processing for Security […]

Apr, 15

Collaborative Diffusion on the GPU for Path-Finding in Games

Exploiting the powerful processing power available on the GPU in many machines, we investigate the performance of parallelised versions of pathfinding algorithms in typical game environments. We describe a parallel implementation of a collaborative diffusion algorithm that is shown to find short paths in real-time across a range of graph sizes and provide a comparison […]

CUDA

Apr, 14

GPU Accelerated Randomized Singular Value Decomposition and Its Application in Image Compression

In this paper, we present a GPU-accelerated implementation of randomized Singular Value Decomposition (SVD) algorithm on a large matrix to rapidly approximate the top-k dominating singular values and correspondent singular vectors. The fundamental idea of randomized SVD is to condense a large matrix into a small dense matrix by random sampling while keeping the important […]

CUDA

Apr, 14

A Parallel Tree Pattern Query Processing Algorithm for Graph Databases using a GPGPU

Large amounts of data are modeled and stored as graphs in order to express complex data relationships. Consequently, query processing on graph structures is becoming an important component in real-world applications. The most commonly used query format is that of tree pattern queries. We present a new parallel SIMD algorithm, GGQ (GPU Graph data base […]

CUDA

* * *

high performance computing on graphics processing units: hgpu.org

Posts

A Convolutional Neural Network Cascade for Face Detection

Verification of Producer-Consumer Synchronization in GPU Programs

Fluid Simulation and Generating Textures with Reaction-Diffusion Systems on Surfaces in the GPU

Optimizing ASP.NET with C++ AMP on the GPU

Unsafe Floating-point to Unsigned Integer Casting Check for GPU Programs

Arbitrary-Precision Arithmetics on the GPU

Deep convolutional networks for pancreas segmentation in CT imaging

NBODY6++GPU: Ready for the gravitational million-body problem

2nd International Conference on Communication and Signal Processing (ICCSP), 2015

Collaborative Diffusion on the GPU for Path-Finding in Games

GPU Accelerated Randomized Singular Value Decomposition and Its Application in Image Compression

A Parallel Tree Pattern Query Processing Algorithm for Graph Databases using a GPGPU

Recent source codes

Specx: Speculative task-based runtime system

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

KISim: Kubernetes Intelligent Scheduling Simulator

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

Most viewed papers (last 30 days)