high performance computing on graphics processing units: hgpu.org

Posts

Dec, 1

Auxiliary Image Regularization for Deep CNNs with Noisy Labels

Precisely-labeled data sets with sufficient amount of samples are notably important for training deep convolutional neural networks (CNNs). However, many of the available real-world data sets contain erroneously labeled samples and the error in labels of training sample makes it a daunting task to learn a well-performing deep CNN model. In this work, we consider […]

CUDA

Dec, 1

A General Framework for Constrained Bayesian Optimization using Information-based Search

We present an information-theoretic framework for solving global black-box optimization problems that also have black-box constraints. Of particular interest to us is to efficiently solve problems with decoupled constraints, in which subsets of the objective and constraint functions may be evaluated independently. For example, when the objective is evaluated on a CPU and the constraints […]

CUDA

Dec, 1

Efficient Static and Dynamic Memory Management Techniques for Multi-GPU Systems

There are four trends in modern high-performance computing (HPC) that have led to an increased need for efficient memory management techniques for heterogeneous systems (such as one fitted with GPUs). First, the average size of datasets for HPC applications is rapidly increasing. Read-only input matrices that used to be on the order of megabytes or […]

CUDA

Dec, 1

Bridging OpenCL and CUDA: A Comparative Analysis and Translation

Heterogeneous systems are widening their user-base, and heterogeneous computing is becoming popular in supercomputing. Among others, OpenCL and CUDA are the most popular programming models for heterogeneous systems. Although OpenCL inherited many features from CUDA and they have almost the same platform model, they are not compatible with each other. In this paper, we present […]

CUDA

•

OpenCL

Dec, 1

Neural GPUs Learn Algorithms

Learning an algorithm from examples is a fundamental problem that has been widely studied. Recently it has been addressed using neural networks, in particular by Neural Turing Machines (NTMs). These are fully differentiable computers that use backpropagation to learn their own programming. Despite their appeal NTMs have a weakness that is caused by their sequential […]

CUDA

Nov, 29

Reordering GPU Kernel Launches to Enable Efficient Concurrent Execution

Contemporary GPUs allow concurrent execution of small computational kernels in order to prevent idling of GPU resources. Despite the potential concurrency between independent kernels, the order in which kernels are issued to the GPU will significantly influence the application performance. A technique for deriving suitable kernel launch orders is therefore presented, with the aim of […]

CUDA

Nov, 29

Design, Implementation and Performance Evaluation of a Stochastic Gradient Descent Algorithm on CUDA

Stochastic Gradient Descent, a stochastic optimization of Gradient Descent, is an algorithm that is used in different topics, like for example for linear regression or logistic regression. After the Netflix prize, SGD start to be used also in recommender systems to compute matrix factorization. Considering the large amounts of data that this kind of system […]

CUDA

Nov, 29

Semantic Segmentation of Colon Glands with Deep Convolutional Neural Networks and Total Variation Segmentation

Segmentation of histopathology sections is an ubiquitous requirement in digital pathology and due to the large variability of biological tissue, machine learning techniques have shown superior performance over standard image processing methods. As part of the GlaS@MICCAI2015 colon gland segmentation challenge, we present a learning-based algorithm to segment glands in tissue of benign and malignant […]

CUDA

Nov, 29

A Problem-Based Learning Approach to GPU Computing

Compared to CPUs, modern GPUs exhibit a high ratio of computing performance per watt, and so current supercomputer designs often include multiple racks of GPUs in order to achieve high teraflop counts at minimal energy cost. GPU programming is thus becoming increasingly important, and yet it remains a challenging task. This paper describes a course […]

OpenCL

Nov, 29

Orchestrating Multiple Data-Parallel Kernels on Multiple Devices

Traditionally, programmers and software tools have focused on mapping a single data-parallel kernel onto a heterogeneous computing system consisting of multiple general-purpose processors (CPUS) and graphics processing units (GPUs). These methodologies break down as application complexity grows to contain multiple communicating data-parallel kernels. This paper introduces MKMD, an automatic system for mapping multiple kernels across […]

OpenCL

Nov, 25

Acceleration of Agent-Based Pandemic Modeling on Multiple GPUs

Epidemiology computation models are crucial for the assessment and control of public health crises. Agent-based simulations of pandemic influenza are useful for forecasting the infectious disease spreading in order to help public health policy makers during emergencies. In such emergencies decisions are required for public health preparedness in cycles of less than a day, and […]

CUDA

Nov, 25

Efficient Resource Sharing Through GPU Virtualization on Accelerated High Performance Computing Systems

The High Performance Computing (HPC) field is witnessing a widespread adoption of Graphics Processing Units (GPUs) as co-processors for conventional homogeneous clusters. The adoption of prevalent Single-Program Multiple-Data (SPMD) programming paradigm for GPU-based parallel processing brings in the challenge of resource underutilization, with the asymmetrical processor/co-processor distribution. In other words, under SPMD, balanced CPU/GPU distribution […]

CUDA

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Auxiliary Image Regularization for Deep CNNs with Noisy Labels

A General Framework for Constrained Bayesian Optimization using Information-based Search

Efficient Static and Dynamic Memory Management Techniques for Multi-GPU Systems

Bridging OpenCL and CUDA: A Comparative Analysis and Translation

Neural GPUs Learn Algorithms

Reordering GPU Kernel Launches to Enable Efficient Concurrent Execution

Design, Implementation and Performance Evaluation of a Stochastic Gradient Descent Algorithm on CUDA

Semantic Segmentation of Colon Glands with Deep Convolutional Neural Networks and Total Variation Segmentation

A Problem-Based Learning Approach to GPU Computing

Orchestrating Multiple Data-Parallel Kernels on Multiple Devices

Acceleration of Agent-Based Pandemic Modeling on Multiple GPUs

Efficient Resource Sharing Through GPU Virtualization on Accelerated High Performance Computing Systems

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)