high performance computing on graphics processing units: hgpu.org

Posts

Aug, 11

GPU-Disasm: A GPU-based x86 Disassembler

Static binary code analysis and reverse engineering are crucial operations for malware analysis, binary-level software protections, debugging, and patching, among many other tasks. Faster binary code analysis tools are necessary for tasks such as analyzing the multitude of new malware samples gathered every day. Binary code disassembly is a core functionality of such tools which […]

CUDA

Aug, 10

Places205-VGGNet Models for Scene Recognition

VGGNets have turned out to be effective for object recognition in still images. However, it is unable to yield good performance by directly adapting the VGGNet models trained on the ImageNet dataset for scene recognition. This report describes our implementation of training the VGGNets on the large-scale Places205 dataset. Specifically, we train three VGGNet models, […]

CUDA

Aug, 10

Practical Algorithms for Finding Extremal Sets

The minimal sets within a collection of sets are defined as the ones which do not have a proper subset within the collection, and the maximal sets are the ones which do not have a proper superset within the collection. Identifying extremal sets is a fundamental problem with a wide-range of applications in SAT solvers, […]

CUDA

Aug, 10

CRINK: Automatic CUDA code generation for affine C programs

Parallel programming has largely evolved as an efficient solution to a large number of compute intensive applications. Graphics Processing Unit (GPUs), traditionally designed to process computer graphics, are now widely applied to process large chunks of data parallely in many computationally expensive applications. While developing parallel programs to run on parallel computing platforms, such as […]

CUDA

Aug, 10

Visual, Spatial and Temporal Quality in Video-Based Reconstruction of People: Achieving, Prototyping and Evaluating

Capturing, recreating and representing a high fidelity virtual representation of the dynamic human form has long been a target for a diverse range of applications including tele-presence, games, film and TV special effects. The complexity of the challenge, to achieve a lifelike, faithful and believable representation, is such that a wide range of techniques and […]

OpenGL

Aug, 10

Accelerating the pre-processing stages of JPEG encoder on a heterogenous system using OpenCL

Color space conversion and downsampling are among the major computationally intensive steps in typical image and video codec standards, and accelerating these steps will improve the performances of these applications significantly. In this paper, we describe the parallel implementation of the color space conversion and downsampling as pre-processing steps for the JPEG encoder in a […]

OpenCL

Aug, 7

Towards Distortion-Predictable Embedding of Neural Networks

Current research in Computer Vision has shown that Convolutional Neural Networks (CNN) give state-of-the-art performance in many classification tasks and Computer Vision problems. The embedding of CNN, which is the internal representation produced by the last layer, can indirectly learn topological and relational properties. Moreover, by using a suitable loss function, CNN models can learn […]

CUDA

Aug, 7

Modern Platform for Parallel Algorithms Testing: Java on Intel Xeon Phi

Parallel algorithms are popular method of increasing system performance. Apart from showing their properties using asymptotic analysis, proof-of-concept implementation and practical experiments are often required. In order to speed up the development and provide simple and easily accessible testing environment that enables execution of reliable experiments, the paper proposes a platform with multi-core computational accelerator: […]

Aug, 7

DenseCut: Densely Connected CRFs for Realtime GrabCut

Figure-ground segmentation from bounding box input, provided either automatically or manually, has been extremely popular in the last decade and influenced various applications. A lot of research has focused on highquality segmentation, using complex formulations which often lead to slow techniques, and often hamper practical usage. In this paper we demonstrate a very fast segmentation […]

CUDA

Aug, 7

Optimising Reconfigurable Systems for Real-time Applications

This thesis addresses the problem of designing real-time reconfigurable systems. Our first contribution of this thesis is to propose novel data structures and memory architectures for accelerating real-time proximity queries, with potential application to robotic surgery. We optimise performance while maintaining accuracy by several techniques including mixed precision, function transformation and streaming data flow. Significant […]

CUDA

Aug, 7

Behavioral Spherical Harmonics for Long-Range Agents’ Interaction

We introduce behavioral spherical harmonic (BSH), a novel approach to efficiently and compactly represent the directional-dependent behavior of agent. BSH is based on spherical harmonics to project the directional information of a group of multiple agents to a vector of few coefficients; thus, BSH drastically reduces the complexity of the directional evaluation, as it requires […]

OpenCL

•

OpenGL

Aug, 7

A Survey Of Techniques for Architecting DRAM Caches

Recent trends of increasing core-count and memory/bandwidth-wall have led to major overhauls in chip architecture. In face of increasing cache capacity demands, researchers have now explored DRAM, which was conventionally considered synonymous to main memory, for designing large last level caches. Efficient integration of DRAM caches in mainstream computing systems, however, also presents several challenges […]

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

chemtrain-deploy: A parallel and scalable framework for machine learning potentials in million-atom MD simulations

microSYCL: SYCL micro-benchmarks repository

Exploring SYCL as a Portability Layer for High-Performance Computing on CPUs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Posts

GPU-Disasm: A GPU-based x86 Disassembler

Places205-VGGNet Models for Scene Recognition

Practical Algorithms for Finding Extremal Sets

CRINK: Automatic CUDA code generation for affine C programs

Visual, Spatial and Temporal Quality in Video-Based Reconstruction of People: Achieving, Prototyping and Evaluating

Accelerating the pre-processing stages of JPEG encoder on a heterogenous system using OpenCL

Towards Distortion-Predictable Embedding of Neural Networks

Modern Platform for Parallel Algorithms Testing: Java on Intel Xeon Phi

DenseCut: Densely Connected CRFs for Realtime GrabCut

Optimising Reconfigurable Systems for Real-time Applications

Behavioral Spherical Harmonics for Long-Range Agents’ Interaction

A Survey Of Techniques for Architecting DRAM Caches

Recent source codes

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

Most viewed papers (last 30 days)