20481

Posts

Apr, 12

LUDA: Boost LSM Key Value Store Compactions with GPUs

Log-Structured-Merge (LSM) tree-based key value stores are facing critical challenges of fully leveraging the dramatic performance improvements of the underlying storage devices, which makes the compaction operations of LSM key value stores become CPU-bound, and slow compactions significantly degrade key value store performance. To address this issue, we propose LUDA, an LSM key value store […]
Apr, 5

Deep Learning for Compilers

Constructing compilers is hard. Optimising compilers are multi-million dollar projects spanning years of development, yet remain unable to fully exploit the available performance, and are prone to bugs. The rapid transition to heterogeneous parallelism and diverse architectures has raised demand for aggressively-optimising compilers to an all time high, leaving compiler developers struggling to keep up. […]
Apr, 5

PyMatting: A Python Library for Alpha Matting

An important step of many image editing tasks is to extract specific objects from an image in order to place them in a scene of a movie or compose them onto another background. Alpha matting describes the problem of separating the objects in the foreground from the background of an image given only a rough […]
Apr, 5

Parallelization of the Honeybee Search Algorithm for Object Tracking

Object tracking refers to the relocation of specific objects in consecutive frames of a video sequence. Presently, this visual task is still considered an open research issue, and the computer science community attempted solutions from the standpoint of methodologies, algorithms, criteria, benchmarks, and so on. This article introduces a GPU-parallelized swarm algorithm, called the Honeybee […]
Apr, 5

Faster than FAST: GPU-Accelerated Frontend for High-Speed VIO

The recent introduction of powerful embedded graphics processing units (GPUs) has allowed for unforeseen improvements in real-time computer vision applications. It has enabled algorithms to run onboard, well above the standard video rates, yielding not only higher information processing capability, but also reduced latency. This work focuses on the applicability of efficient low-level, GPU hardware-specific […]
Apr, 5

Understanding GPU-Based Lossy Compression for Extreme-Scale Cosmological Simulations

To help understand our universe better, researchers and scientists currently run extreme-scale cosmology simulations on leadership supercomputers. However, such simulations can generate large amounts of scientific data, which often result in expensive costs in data associated with data movement and storage. Lossy compression techniques have become attractive because they significantly reduce data size and can […]
Mar, 29

Large-Scale Data Computing Performance Comparisons on SYCL Heterogeneous Parallel Processing Layer Implementations

Today, many big data applications require massively parallel tasks to compute complicated mathematical operations. To perform parallel tasks, platforms like CUDA (Compute Unified Device Architecture) and OpenCL (Open Computing Language) are widely used and developed to enhance the throughput of massively parallel tasks. There is also a need for high-level abstractions and platform-independence over those […]
Mar, 29

Characterizing Optimizations to Memory Access Patterns using Architecture-Independent Program Features

High-performance computing developers are faced with the challenge of optimizing the performance of OpenCL workloads on diverse architectures. The Architecture-Independent Workload Characterization (AIWC) tool is a plugin for the Oclgrind OpenCL simulator that gathers metrics of OpenCL programs that can be used to understand and predict program performance on an arbitrary given hardware architecture. However, […]
Mar, 29

SOL: Effortless Device Support for AI Frameworks without Source Code Changes

Modern high performance computing clusters heavily rely on accelerators to overcome the limited compute power of CPUs. These supercomputers run various applications from different domains such as simulations, numerical applications or artificial intelligence (AI). As a result, vendors need to be able to efficiently run a wide variety of workloads on their hardware. In the […]
Mar, 29

ProGraML: Graph-based Deep Learning for Program Optimization and Analysis

The increasing complexity of computing systems places a tremendous burden on optimizing compilers, requiring ever more accurate and aggressive optimizations. Machine learning offers significant benefits for constructing optimization heuristics but there remains a gap between what state-of-the-art methods achieve and the performance of an optimal heuristic. Closing this gap requires improvements in two key areas: […]
Mar, 29

ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators

Masked language modeling (MLM) pre-training methods such as BERT corrupt the input by replacing some tokens with [MASK] and then train a model to reconstruct the original tokens. While they produce good results when transferred to downstream NLP tasks, they generally require large amounts of compute to be effective. As an alternative, we propose a […]
Mar, 22

Fireiron: A Scheduling Language for High-Performance Linear Algebra on GPUs

Achieving high-performance GPU kernels requires optimizing algorithm implementations to the targeted GPU architecture. It is of utmost importance to fully use the compute and memory hierarchy, as well as available specialised hardware. Currently, vendor libraries like cuBLAS and cuDNN provide the best performing implementations of GPU algorithms. However the task of the library programmer is […]

Recent source codes

* * *

* * *

HGPU group © 2010-2020 hgpu.org

All rights belong to the respective authors

Contact us: