high performance computing on graphics processing units: hgpu.org

Posts

Jan, 23

DeepBach: a Steerable Model for Bach chorales generation

The composition of polyphonic chorale music in the style of J.S Bach has represented a major challenge in automatic music composition over the last decades. The art of Bach chorales composition involves combining four-part harmony with characteristic rhythmic patterns and typical melodic movements to produce musical phrases which begin, evolve and end (cadences) in a […]

CUDA

Jan, 23

GPGPU Performance Estimation with Core and Memory Frequency Scaling

Graphics Processing Units (GPUs) support dynamic voltage and frequency scaling (DVFS) in order to balance computational performance and energy consumption. However, there still lacks simple and accurate performance estimation of a given GPU kernel under different frequency settings on real hardware, which is important to decide best frequency configuration for energy saving. This paper reveals […]

CUDA

Jan, 23

A task-driven implementation of a simple numerical solver for hyperbolic conservation laws

This article describes the implementation of an all-in-one numerical procedure within the runtime StarPU. In order to limit the complexity of the method, for the sake of clarity of the presentation of the non-classical task-driven programming environnement, we have limited the numerics to first order in space and time. Results show that the task distribution […]

CUDA

Jan, 23

Multi-core parallelism in a column-store

The research reported in this thesis addresses several challenges of improving the efficiency and effectiveness of parallel processing of analytical database queries on modern multi- and many-core systems, using an open-source column-oriented analytical database management system, MonetDB, for validation. In contrast to the existing work we also broaden the research from focusing on individual operators […]

Jan, 19

Deep Learning for Computational Chemistry

The rise and fall of artificial neural networks is well documented in the scientific literature of both computer science and computational chemistry. Yet almost two decades later, we are now seeing a resurgence of interest in deep learning, a machine learning algorithm based on multilayer neural networks. Within the last few years, we have seen […]

Jan, 19

OpenNMT: Open-Source Toolkit for Neural Machine Translation

We describe an open-source toolkit for neural machine translation (NMT). The toolkit prioritizes efficiency, modularity, and extensibility with the goal of supporting NMT research into model architectures, feature representations, and source modalities, while maintaining competitive performance and reasonable training requirements. The toolkit consists of modeling and translation support, as well as detailed pedagogical documentation about […]

CUDA

Jan, 19

Xeon Phi: A comparison between the newly introduced MIC architecture and a standard CPU through three types of problems

As Moore s law continues, processors keep getting more cores packed together on the chip. This thesis is an empirical study of the rather newly introduced Intel Many Integrated Core (IMIC) architecture found in the Intel Xeon Phi. With roughly 60 cores connected by a high performance on-die interconnect, the Intel Xeon Phi makes an […]

Jan, 19

Parallelizing Exact and Approximate String Matching via Inclusive Scan on a GPU

In this study, to substantially improve the runtimes of exact and approximate string matching algorithms, we propose a tribrid parallel method for bit-parallel algorithms such as the Shift-Or and Wu-Manber algorithms. Our underlying idea is to interpret bit-parallel algorithms as inclusive-scan operations, which allow these bit-parallel algorithms to run efficiently on a graphics processing unit […]

CUDA

Jan, 19

Light Loss-Less Data Compression, with GPU Implementation

There is no doubt that data compression is very important in computer engineering. However, most lossless data compression and decompression algorithms are very hard to parallelize, because they use dictionaries updated sequentially. The main contribution of this paper is to present a new lossless data compression method that we call Light Loss-Less (LLL) compression. It […]

CUDA

Jan, 16

An OpenCL(TM) Deep Learning Accelerator on Arria 10

Convolutional neural nets (CNNs) have become a practical means to perform vision tasks, particularly in the area of image classification. FPGAs are well known to be able to perform convolutions efficiently, however, most recent efforts to run CNNs on FPGAs have shown limited advantages over other devices such as GPUs. Previous approaches on FPGAs have […]

OpenCL

Jan, 16

Using efficient parallelization in Graphic Processing Units to parameterize stochastic fire propagation models

Fire propagation is a major concern in the world in general and in Argentinian northwestern Patagonia in particular where every year hundreds of hectares are affected by both natural and anthropogenic forest fires. We developed an efficient cellular automata model in Graphic Processing Units (GPUs) to simulate fire propagation. The graphical advantages of GPUs were […]

CUDA

Jan, 16

Application of GPU Computing to Some Urban Traffic Problems

The present work studies and proposes GPU-based parallel algorithms and implementations for the problem of macroscopic assignment of urban traffic on large-scale networks, promoting an in-depth investigation on each sub-problem that must be efficiently solved during the traffic assignment process. Among the main contributions of this work, there are: 1) the first GPU-based algorithm for […]

OpenCL

* * *

high performance computing on graphics processing units: hgpu.org

Posts

DeepBach: a Steerable Model for Bach chorales generation

GPGPU Performance Estimation with Core and Memory Frequency Scaling

A task-driven implementation of a simple numerical solver for hyperbolic conservation laws

Multi-core parallelism in a column-store

Deep Learning for Computational Chemistry

OpenNMT: Open-Source Toolkit for Neural Machine Translation

Xeon Phi: A comparison between the newly introduced MIC architecture and a standard CPU through three types of problems

Parallelizing Exact and Approximate String Matching via Inclusive Scan on a GPU

Light Loss-Less Data Compression, with GPU Implementation

An OpenCL(TM) Deep Learning Accelerator on Arria 10

Using efficient parallelization in Graphic Processing Units to parameterize stochastic fire propagation models

Application of GPU Computing to Some Urban Traffic Problems

Recent source codes

Specx: Speculative task-based runtime system

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

KISim: Kubernetes Intelligent Scheduling Simulator

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

Most viewed papers (last 30 days)