high performance computing on graphics processing units: hgpu.org

Posts

Aug, 28

Boosting Java Performance using GPGPUs

Heterogeneous programming has started becoming the norm in order to achieve better performance by running portions of code on the most appropriate hardware resource. Currently, significant engineering efforts are undertaken in order to enable existing programming languages to perform heterogeneous execution mainly on GPUs. In this paper we describe Jacc, an experimental framework which allows […]

CUDA

Aug, 28

VisPy: Harnessing The GPU For Fast, High-Level Visualization

The growing availability of large, multidimensional data sets has created demand for high-performance, interactive visualization tools. VisPy leverages the GPU to provide fast, interactive, and beautiful visualizations in a high-level API. Here we introduce the main features, architecture, and techniques used in VisPy.

OpenGL

Aug, 28

High-Speed Object Detection: Design, Study and Implementation of a Detection Framework using Channel Features and Boosting

In this thesis we design, implement and study a high-speed object detection framework. Our baseline detector uses integral channel features as object representation and AdaBoost as supervised learning algorithm. We suggest the implementation of two approximation techniques for speeding up the baseline detector and show their effectiveness by performing experiments on both detection quality and […]

OpenCL

Aug, 28

Deep Convolutional Neural Networks for Smile Recognition

This thesis describes the design and implementation of a smile detector based on deep convolutional neural networks. It starts with a summary of neural networks, the difficulties of training them and new training methods, such as Restricted Boltzmann Machines or autoencoders. It then provides a literature review of convolutional neural networks and recurrent neural networks. […]

CUDA

Aug, 28

A Parallel Algorithm to Test Chordality of Graphs

We present a simple parallel algorithm to test chordality of graphs which is based on the parallel Lexicographical Breadth-First Search algorithm. In total, the algorithm takes time O(N) on N-threads machine and it performs work O(N^2), where N is the number of vertices in a graph. Our implementation of the algorithm uses a GPU environment […]

CUDA

Aug, 27

CudaChain: A Practical GPU-accelerated 2D Convex Hull Algorithm

This paper presents a practical GPU-accelerated convex hull algorithm and a novel Sorting-based Preprocessing Approach (SPA) for planar point sets. The proposed algorithm consists of two stages: (1) two rounds of preprocessing performed on the GPU and (2) the finalization of calculating the expected convex hull on the CPU. We first discard the interior points […]

CUDA

Aug, 27

gScan: Accelerating Graham Scan on the GPU

This paper presents a fast implementation of the Graham scan on the GPU. The proposed algorithm is composed of two stages: (1) two rounds of preprocessing performed on the GPU and (2) the finalization of finding the convex hull on the CPU. We first discard the interior points that locate inside a quadrilateral formed by […]

CUDA

Aug, 27

Adaptive Multi-GPU Exchange Monte Carlo for the 3D Random Field Ising Model

The study of disordered spin systems through Monte Carlo simulations has proven to be a hard task due to the adverse energy landscape present at the low temperature regime, making it difficult for the simulation to escape from a local minimum. Replica based algorithms such as the Exchange Monte Carlo (also known as parallel tempering) […]

CUDA

Aug, 27

Accelerated Deep Learning using Intel Xeon Phi

Deep learning, a sub-topic of machine learning inspired by biology, have achieved wide attention in the industry and research community recently. State-of-the-art applications in the area of computer vision and speech recognition (among others) are built using deep learning algorithms. In contrast to traditional algorithms, where the developer fully instructs the application what to do, […]

Aug, 27

MemcachedGPU: Scaling-up Scale-out Key-value Stores

This paper tackles the challenges of obtaining more efficient data center computing while maintaining low latency, low cost, programmability, and the potential for workload consolidation. We introduce GNoM, a software framework enabling energy-efficient, latency bandwidth optimized UDP network and application processing on GPUs. GNoM handles the data movement and task management to facilitate the development […]

CUDA

Aug, 24

First International Workshop on Heterogeneous High-performance Reconfigurable Computing (H2RC’15), 2015

With Exascale systems on the horizon at the same time that conventional von-Neumann architectures are suffering from rising power densities, we are facing an era with power, energy-efficiency, and cooling as first-class constraints for scalable HPC. FPGAs can tailor the hardware to the application, avoiding overheads of general-purpose architectures–for example, through customized datapaths and memory […]

Aug, 24

Performance Evaluations of Document-Oriented Databases using GPU and Cache Structure

Document-oriented databases are popular databases, in which users can store their documents in a schema-less manner and perform search queries for them. They have been widely used for web applications that process a large collection of documents because of their high scalability and rich functions. One of major functions of documentoriented databases is a string […]

CUDA

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Boosting Java Performance using GPGPUs

VisPy: Harnessing The GPU For Fast, High-Level Visualization

High-Speed Object Detection: Design, Study and Implementation of a Detection Framework using Channel Features and Boosting

Deep Convolutional Neural Networks for Smile Recognition

A Parallel Algorithm to Test Chordality of Graphs

CudaChain: A Practical GPU-accelerated 2D Convex Hull Algorithm

gScan: Accelerating Graham Scan on the GPU

Adaptive Multi-GPU Exchange Monte Carlo for the 3D Random Field Ising Model

Accelerated Deep Learning using Intel Xeon Phi

MemcachedGPU: Scaling-up Scale-out Key-value Stores

First International Workshop on Heterogeneous High-performance Reconfigurable Computing (H2RC’15), 2015

Performance Evaluations of Document-Oriented Databases using GPU and Cache Structure

Recent source codes

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Most viewed papers (last 30 days)