high performance computing on graphics processing units: hgpu.org

Posts

Dec, 1

Neural GPUs Learn Algorithms

Learning an algorithm from examples is a fundamental problem that has been widely studied. Recently it has been addressed using neural networks, in particular by Neural Turing Machines (NTMs). These are fully differentiable computers that use backpropagation to learn their own programming. Despite their appeal NTMs have a weakness that is caused by their sequential […]

CUDA

Nov, 29

Reordering GPU Kernel Launches to Enable Efficient Concurrent Execution

Contemporary GPUs allow concurrent execution of small computational kernels in order to prevent idling of GPU resources. Despite the potential concurrency between independent kernels, the order in which kernels are issued to the GPU will significantly influence the application performance. A technique for deriving suitable kernel launch orders is therefore presented, with the aim of […]

CUDA

Nov, 29

Design, Implementation and Performance Evaluation of a Stochastic Gradient Descent Algorithm on CUDA

Stochastic Gradient Descent, a stochastic optimization of Gradient Descent, is an algorithm that is used in different topics, like for example for linear regression or logistic regression. After the Netflix prize, SGD start to be used also in recommender systems to compute matrix factorization. Considering the large amounts of data that this kind of system […]

CUDA

Nov, 29

Semantic Segmentation of Colon Glands with Deep Convolutional Neural Networks and Total Variation Segmentation

Segmentation of histopathology sections is an ubiquitous requirement in digital pathology and due to the large variability of biological tissue, machine learning techniques have shown superior performance over standard image processing methods. As part of the GlaS@MICCAI2015 colon gland segmentation challenge, we present a learning-based algorithm to segment glands in tissue of benign and malignant […]

CUDA

Nov, 29

A Problem-Based Learning Approach to GPU Computing

Compared to CPUs, modern GPUs exhibit a high ratio of computing performance per watt, and so current supercomputer designs often include multiple racks of GPUs in order to achieve high teraflop counts at minimal energy cost. GPU programming is thus becoming increasingly important, and yet it remains a challenging task. This paper describes a course […]

OpenCL

Nov, 29

Orchestrating Multiple Data-Parallel Kernels on Multiple Devices

Traditionally, programmers and software tools have focused on mapping a single data-parallel kernel onto a heterogeneous computing system consisting of multiple general-purpose processors (CPUS) and graphics processing units (GPUs). These methodologies break down as application complexity grows to contain multiple communicating data-parallel kernels. This paper introduces MKMD, an automatic system for mapping multiple kernels across […]

OpenCL

Nov, 25

Acceleration of Agent-Based Pandemic Modeling on Multiple GPUs

Epidemiology computation models are crucial for the assessment and control of public health crises. Agent-based simulations of pandemic influenza are useful for forecasting the infectious disease spreading in order to help public health policy makers during emergencies. In such emergencies decisions are required for public health preparedness in cycles of less than a day, and […]

CUDA

Nov, 25

Efficient Resource Sharing Through GPU Virtualization on Accelerated High Performance Computing Systems

The High Performance Computing (HPC) field is witnessing a widespread adoption of Graphics Processing Units (GPUs) as co-processors for conventional homogeneous clusters. The adoption of prevalent Single-Program Multiple-Data (SPMD) programming paradigm for GPU-based parallel processing brings in the challenge of resource underutilization, with the asymmetrical processor/co-processor distribution. In other words, under SPMD, balanced CPU/GPU distribution […]

CUDA

Nov, 25

Optimization of a Machine Learning Algorithm on the Heterogeneous system using OpenCL

Today, there is no one who disagrees on how important data is in every industry especially in enterprise market. More recently, the key point that decides the survival of a business is the management of their big data, which is defined by the 3V’s: Volume, Velocity, and Variety [1]. While the rate of data generation […]

OpenCL

Nov, 25

GPU-based Acceleration of Deep Convolutional Neural Networks on Mobile Platforms

Mobile applications running on wearable devices and smartphones can greatly benefit from accurate and scalable deep CNN-based machine learning algorithms. While mobile CPU performance does not match the intensive computational requirement of deep CNNs, the embedded GPU which already exists in many mobile platforms can be leveraged for acceleration of CNN computations on the local […]

Nov, 25

Pulsar Acceleration Searches on the GPU for the Square Kilometre Array

Pulsar acceleration searches are methods for recovering signals from radio telescopes, that may otherwise be lost due to the effect of orbital acceleration in binary systems. The vast amount of data that will be produced by next generation instruments such as the Square Kilometre Array (SKA) necessitates real-time acceleration searches, which in turn requires the […]

CUDA

Nov, 24

Learning Representation for Scene Understanding: Epitomes, CRFs, and CNNs

Scene understanding, such as image classification and semantic image segmentation, has been a challenging problem in computer vision. The difficulties mainly come from the feature representation, i.e., how to find a good representation for images. Instead of improving over hand-crafted features such as SIFT or HoG, we focus on learning image representations by generative and […]

CUDA

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Neural GPUs Learn Algorithms

Reordering GPU Kernel Launches to Enable Efficient Concurrent Execution

Design, Implementation and Performance Evaluation of a Stochastic Gradient Descent Algorithm on CUDA

Semantic Segmentation of Colon Glands with Deep Convolutional Neural Networks and Total Variation Segmentation

A Problem-Based Learning Approach to GPU Computing

Orchestrating Multiple Data-Parallel Kernels on Multiple Devices

Acceleration of Agent-Based Pandemic Modeling on Multiple GPUs

Efficient Resource Sharing Through GPU Virtualization on Accelerated High Performance Computing Systems

Optimization of a Machine Learning Algorithm on the Heterogeneous system using OpenCL

GPU-based Acceleration of Deep Convolutional Neural Networks on Mobile Platforms

Pulsar Acceleration Searches on the GPU for the Square Kilometre Array

Learning Representation for Scene Understanding: Epitomes, CRFs, and CNNs

Recent source codes

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

PELSI: Power-Efficient Layer-Switched Inference

Ouroboros: Virtualized Queues for dynamic memory management

MSCCL++: A GPU-driven communication stack for scalable AI applications

Benchmark compute shader of Unity against InteropUnityCUDA

Most viewed papers (last 30 days)