high performance computing on graphics processing units: hgpu.org

Posts

Apr, 25

2nd International Conference on Robotics and Computer Vision (ICRCV 2015), 2015

Submission Deadline: 2015-07-10 Topics: • Evolutionary Robotics • Distributed Sensor Networks • Robot Surgery • Search and Rescue Robots • Biorobotics • Humanoid Robotics • Autonomous Vehicles • Entertainment Robots • Rehabilitation Robotics • Micro/Nano Robotics • Underwater Robots • Service Robotics • Sensors and Early Vision • Color and Texture • Segmentation and Grouping […]

Apr, 23

A Framework for General Sparse Matrix-Matrix Multiplication on GPUs and Heterogeneous Processors

General sparse matrix-matrix multiplication (SpGEMM) is a fundamental building block for numerous applications such as algebraic multigrid method (AMG), breadth first search and shortest path problem. Compared to other sparse BLAS routines, an efficient parallel SpGEMM implementation has to handle extra irregularity from three aspects: (1) the number of nonzero entries in the resulting sparse […]

CUDA

•

OpenCL

Apr, 23

Multi-swarm PSO algorithm for the Quadratic Assignment Problem: a massive parallel implementation on the OpenCL platform

This paper presents a multi-swarm PSO algorithm for the Quadratic Assignment Problem (QAP) implemented on OpenCL platform. Our work was motivated by results of time efficiency tests performed for single-swarm algorithm implementation that showed clearly that the benefits of a parallel execution platform can be fully exploited, if the processed population is large. The described […]

OpenCL

Apr, 23

A High-resolution approach for Tsunami impact simulation on graphics processing units

Having learned a great deal about the problem and also the solutions over the course of this project, it is the opinion of the author that the method undertaken within this report is unsatisfactory for delivering performance enhancement over alternative approaches. Firstly the domain transfers result in reduced performance. For larger simulations these prove to […]

OpenCL

Apr, 23

Multi-GPU Graph Analytics

We present a multi-GPU graph processing library that allows programmers to easily extend single-GPU graph algorithms to achieve scalable performance on large graph datasets with billions of edges. Our design only requires users to specify a few algorithm-dependent blocks, hiding most multi-GPU related implementation details. Our design effectively overlaps computation and data transfer and implements […]

CUDA

Apr, 23

Convolutional Neural Network-Based Image Representation for Visual Loop Closure Detection

Deep convolutional neural networks (CNN) have recently been shown in many computer vision and pattern recognition applications to outperform by a significant margin state-of-the-art solutions that use traditional hand-crafted features. However, this impressive performance is yet to be fully exploited in robotics. In this paper, we focus one specific problem that can benefit from the […]

CUDA

Apr, 21

A Survey of Techniques for Modeling and Improving Reliability of Computing Systems

Recent trends of aggressive technology scaling have greatly exacerbated the occurrences and impact of faults in computing systems. This has made `reliability’ a first-order design constraint. To address the challenges of reliability, several techniques have been proposed. This paper provides a survey of architectural techniques for improving resilience of computing systems. We especially focus on […]

Apr, 20

Caffe con Troll: Shallow Ideas to Speed Up Deep Learning

We present Caffe con Troll (CcT), a fully compatible end-to-end version of the popular framework Caffe with rebuilt internals. We built CcT to examine the performance characteristics of training and deploying general-purpose convolutional neural networks across different hardware architectures. We find that, by employing standard batching optimizations for CPU training, we achieve up to one […]

CUDA

Apr, 20

An efficient midpoint-radius representation format to deal with symmetric fuzzy numbers

This paper proposes a novel representation for symmetric fuzzy numbers that uses the midpoint-radius approach instead of the conventional lower-upper representation. A theoretical analysis based on the alpha-cut concept shows that the proposed format requires half the amount of operations and memory than the traditional one. Also, a novel technique involving radius increments is introduced, […]

CUDA

Apr, 20

A Convolutional Neural Network Cascade for Face Detection

In real-world face detection, large visual variations, such as those due to pose, expression, and lighting, demand an advanced discriminative model to accurately differentiate faces from the backgrounds. Consequently, effective models for the problem tend to be computationally prohibitive. To address these two conflicting challenges, we propose a cascade architecture built on convolutional neural networks […]

CUDA

Apr, 20

Verification of Producer-Consumer Synchronization in GPU Programs

Previous efforts to formally verify code written for GPUs have focused solely on kernels written within the traditional data-parallel GPU programming model. No previous work has considered the higher performance, but more complex, warp-specialized kernels based on producer-consumer named barriers available on current hardware. In this work we present the first formal operational semantics for […]

CUDA

Apr, 20

Fluid Simulation and Generating Textures with Reaction-Diffusion Systems on Surfaces in the GPU

In recent years, many researchers have used the Navier-Stokes equations and Reaction-Diffusion systems for fluid simulation and for the creation of textures on surfaces, respectively. For this purpose it is necessary to obtain information about operators defined on surfaces. We obtained the metric information of the distortion caused by the parametrization of Catmull-Clark subdivision surfaces. […]

CUDA

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

* * *

high performance computing on graphics processing units: hgpu.org

Posts

2nd International Conference on Robotics and Computer Vision (ICRCV 2015), 2015

A Framework for General Sparse Matrix-Matrix Multiplication on GPUs and Heterogeneous Processors

Multi-swarm PSO algorithm for the Quadratic Assignment Problem: a massive parallel implementation on the OpenCL platform

A High-resolution approach for Tsunami impact simulation on graphics processing units

Multi-GPU Graph Analytics

Convolutional Neural Network-Based Image Representation for Visual Loop Closure Detection

A Survey of Techniques for Modeling and Improving Reliability of Computing Systems

Caffe con Troll: Shallow Ideas to Speed Up Deep Learning

An efficient midpoint-radius representation format to deal with symmetric fuzzy numbers

A Convolutional Neural Network Cascade for Face Detection

Verification of Producer-Consumer Synchronization in GPU Programs

Fluid Simulation and Generating Textures with Reaction-Diffusion Systems on Surfaces in the GPU

Recent source codes

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Most viewed papers (last 30 days)