high performance computing on graphics processing units: hgpu.org

Posts

Apr, 23

8th International Conference on Biology, Environment and Chemistry (ICBEC), 2017

2017 8th International Conference on Biology, Environment and Chemistry (ICBEC 2017) will be held in Busan, South Korea during October 11-13, 2017. ICBEC 2017 is sponsored by the Hong Kong Chemical, Biological & Environmental Engineering Society (HKICBEES). It is one of the leading international conferences for presenting novel and fundamental advances in the fields of […]

Apr, 23

2nd International Conference on Communication and Information Systems (ICCIS), 2017

ICCIS 2017 will be a perfect platform to share experience, foster collaborations across industry and academia, and evaluate emerging technologies across the globe. Publication Peer reviewed and presented papers in ICCIS 2017 will be published in the conference proceedings, which will be submitted for Ei Compendex and Scopus index. Submission Methods Full Paper(publication and oral […]

Apr, 20

Capability Models for Manycore Memory Systems: A Case-Study with Xeon Phi KNL

Increasingly complex memory systems and onchip interconnects are developed to mitigate the data movement bottlenecks in manycore processors. One example of such a complex system is the Xeon Phi KNL CPU with three different types of memory, fifteen memory configuration options, and a complex on-chip mesh network connecting up to 72 cores. Users require a […]

Apr, 20

Benchmarking OpenCL, OpenACC, OpenMP, and CUDA: programming productivity, performance, and energy consumption

Many modern parallel computing systems are heterogeneous at their node level. Such nodes may comprise general purpose CPUs and accelerators (such as, GPU, or Intel Xeon Phi) that provide high performance with suitable energy-consumption characteristics. However, exploiting the available performance of heterogeneous architectures may be challenging. There are various parallel programming frameworks (such as, OpenMP, […]

CUDA

•

OpenCL

Apr, 20

Exploration of cyber-physical systems for GPGPU computer vision-based detection of biological viruses

This work presents a method for a computer vision-based detection of biological viruses in PAMONO sensor images and, related to this, methods to explore cyber-physical systems such as those consisting of the PAMONO sensor, the detection software, and processing hardware. The focus is especially on an exploration of Graphics Processing Units (GPU) hardware for "General-Purpose […]

OpenCL

Apr, 20

A hybrid CPU-GPU parallelization scheme of variable neighborhood search for inventory optimization problems

In this paper, we study various parallelization schemes for the Variable Neighborhood Search (VNS) metaheuristic on a CPU-GPU system via OpenMP and OpenACC. A hybrid parallel VNS method is applied to recent benchmark problem instances for the multi-product dynamic lot sizing problem with product returns and recovery, which appears in reverse logistics and is known […]

Apr, 20

Evaluation of GPU-based track-triggering for the CMS detector at CERN’s HL-LHC

In this work we present an evaluation of GPUs as a possible L1 Track Trigger for the High Luminosity LHC, effective after Long Shutdown 3 around 2025. The novelty lies in presenting an implementation based on calculations done entirely in software, in contrast to currently discussed solutions relying on specialized hardware, such as FPGAs and […]

CUDA

Apr, 17

Random Finite Set Based Bayesian Filtering with OpenCL in a Heterogeneous Platform

While most filtering approaches based on random finite sets have focused on improving performance, in this paper, we argue that computation times are very important in order to enable real-time applications such as pedestrian detection. Towards this goal, this paper investigates the use of OpenCL to accelerate the computation of random finite set-based Bayesian filtering […]

OpenCL

Apr, 17

Investigation of heterogeneous computing through novel parallel programming platforms

The computational landscape is dominated by the use of a very high number of CPU resources; this has however provided diminishing returns in recent years, pushing for a paradigm shift in the choice for computational systems. The following work was aimed at determining the maturity of heterogeneous computer systems in terms of computational performance and […]

OpenCL

Apr, 17

Parallel Multi Channel Convolution using General Matrix Multiplication

Convolutional neural networks (CNNs) have emerged as one of the most successful machine learning technologies for image and video processing. The most computationally intensive parts of CNNs are the convolutional layers, which convolve multi-channel images with multiple kernels. A common approach to implementing convolutional layers is to expand the image into a column matrix (im2col) […]

CUDA

Apr, 17

GPU implementation of the Rosenbluth generation method for static Monte Carlo simulations

We present parallel version of Rosenbluth Self-Avoiding Walk generation method implemented on Graphics Processing Units (GPUs) using CUDA libraries. The method scales almost linearly with the number of CUDA cores and the method efficiency has only hardware limitations. The method is introduced in two realizations: on a cubic lattice and in real space. We find […]

CUDA

Apr, 17

CBinfer: Change-Based Inference for Convolutional Neural Networks on Video Data

Extracting per-frame features using convolutional neural networks for real-time processing of video data is currently mainly performed on powerful GPU-accelerated workstations and compute clusters. However, there are many applications such as smart surveillance cameras that require or would benefit from on-site processing. To this end, we propose and evaluate a novel algorithm for change-based evaluation […]

CUDA

* * *

high performance computing on graphics processing units: hgpu.org

Posts

8th International Conference on Biology, Environment and Chemistry (ICBEC), 2017

2nd International Conference on Communication and Information Systems (ICCIS), 2017

Capability Models for Manycore Memory Systems: A Case-Study with Xeon Phi KNL

Benchmarking OpenCL, OpenACC, OpenMP, and CUDA: programming productivity, performance, and energy consumption

Exploration of cyber-physical systems for GPGPU computer vision-based detection of biological viruses

A hybrid CPU-GPU parallelization scheme of variable neighborhood search for inventory optimization problems

Evaluation of GPU-based track-triggering for the CMS detector at CERN’s HL-LHC

Random Finite Set Based Bayesian Filtering with OpenCL in a Heterogeneous Platform

Investigation of heterogeneous computing through novel parallel programming platforms

Parallel Multi Channel Convolution using General Matrix Multiplication

GPU implementation of the Rosenbluth generation method for static Monte Carlo simulations

CBinfer: Change-Based Inference for Convolutional Neural Networks on Video Data

Recent source codes

Specx: Speculative task-based runtime system

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

KISim: Kubernetes Intelligent Scheduling Simulator

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

Most viewed papers (last 30 days)