Nov, 22

GPURepair: Automated Repair of GPU Kernels

This paper presents a tool for repairing errors in GPU kernels written in CUDA or OpenCL due to data races and barrier divergence. Our novel extension to prior work can also remove barriers that are deemed unnecessary for correctness. We implement these ideas in our tool called GPURepair, which uses GPUVerify as the verification oracle […]
Nov, 15

Adaptive Data Migration in Load-Imbalanced HPC Applications

Distributed parallel applications need to maximize and maintain computer resource utilization and be portable across different machines. Balanced execution of some applications requires more effort than others because their data distribution changes over time. Data re-distribution at runtime requires elaborate schemes that are expensive and may benefit particular applications. This dissertation discusses a solution for […]
Nov, 15

Runtime Performances Benchmark for Knowledge Graph Embedding Methods

This paper wants to focus on providing a characterization of the runtime performances of state-of-the-art implementations of KGE alghoritms, in terms of memory footprint and execution time. Despite the rapidly growing interest in KGE methods, so far little attention has been devoted to their comparison and evaluation; in particular, previous work mainly focused on performance […]
Nov, 15

FastSVC: Fast Cross-Domain Singing Voice Conversion with Feature-wise Linear Modulation

This paper presents FastSVC, a light-weight cross-domain sing voice conversion (SVC) system, which is able to achieve high conversion performance, with inference speed 4x faster than real time on CPUs. FastSVC uses Conformer based phoneme recognizer to extract singer-agnostic linguistic features from singing signals. A feature-wise linear modulation based generator is used to synthesize waveform […]
Nov, 15

Exploring the acceleration of Nekbone on reconfigurable architectures

Hardware technological advances are struggling to match scientific ambition, and a key question is how we can use the transistors that we already have more effectively. This is especially true for HPC, where the tendency is often to throw computation at a problem whereas codes themselves are commonly bound, at-least to some extent, by other […]
Nov, 15

Automatic GPU optimization through higher-order functions in functional languages

Over recent years, graphics processing units (GPUs) have become popular devices to use in procedures that exhibit data-parallelism. Due to high parallel capability, running procedures on a GPU can result in an execution time speedup ranging from a couple times faster to several orders of magnitude faster, compared to executing serially on a central processing […]
Nov, 8

Design and Performance Evaluation of Optimizations for OpenCL FPGA Kernels

The use of FPGAs in heterogeneous systems are valuable because they can be used to architect custom hardware to accelerate a particular application or domain. However, they are notoriously difficult to program. The development of high level synthesis tools like OpenCL make FPGA development more accessible, but not without its own challenges. The synthesized hardware […]
Nov, 8

Transparent Compiler and Runtime Specializations for Accelerating Managed Languages on FPGAs

In recent years, heterogeneous computing has emerged as the vital way to increase computers’ performance and energy efficiency by combining diverse hardware devices, such as Graphics Processing Units (GPUs) and Field Programmable Gate Arrays (FPGAs). The rationale behind this trend is that different parts of an application can be offloaded from the main CPU to […]
Nov, 8

AMGCL – A C++ library for efficient solution of large sparse linear systems

AMGCL is a header-only C++ library for the solution of large sparse linear systems with algebraic multigrid. The method may be used as a black-box solver for computational problems in various fields, since it does not require any information about the underlying geometry. AMGCL provides an efficient, flexible, and extensible implementation of several iterative solvers […]
Nov, 8

TopicBERT for Energy Efficient Document Classification

Prior research notes that BERT’s computational cost grows quadratically with sequence length thus leading to longer training times, higher GPU memory constraints and carbon emissions. While recent work seeks to address these scalability issues at pre-training, these issues are also prominent in fine-tuning especially for long sequence tasks like document classification. Our work thus focuses […]
Nov, 8

Tinker-HP: Accelerating Molecular Dynamics Simulations of Large Complex Systems with Advanced Point Dipole Polarizable Force Fields using GPUs and Multi-GPUs systems

We present the extension of the Tinker-HP package (Lagardère et al., Chem. Sci., 2018,9, 956-972) to the use of Graphics Processing Unit (GPU) cards to accelerate molecular dynamics simulations using polarizable many-body force fields. The new high-performance module allows for an efficient use of single- and multi-GPU architectures ranging from research laboratories to modern pre-exascale […]
Nov, 1

Designing a Modern Skeleton Programming Framework for Parallel and Heterogeneous Systems

Today’s society is increasingly software-driven and dependent on powerful computer technology. Therefore it is important that advancements in the low-level processor hardware are made available for exploitation by a growing number of programmers of differing skill level. However, as we are approaching the end of Moore’s law, hardware designers are finding new and increasingly complex […]

* * *

* * *

HGPU group © 2010-2021 hgpu.org

All rights belong to the respective authors

Contact us: