high performance computing on graphics processing units: hgpu.org

Posts

Nov, 22

Dynamic adaptation and distribution of binaries to heterogeneous architectures

Real time multimedia workloads require progressingly more processing power. Modern many-core architectures provide enough processing power to satisfy the requirements of many real time multimedia workloads. When even they are unable to satisfy processing power requirements, network-distribution can provide many workloads with even more computing power. In this thesis, we present solutions that can be […]

CUDA

•

OpenCL

Nov, 22

Efficient Shallow Water Simulations on GPUs

For some classes of problems, NVIDIA CUDA abstraction and hardware properties combine with problem characteristics to limit the specific problem instances that can be effectively accelerated. As a real-world example, a twodimensional correlation-based template-matching MATLAB application is considered. While this problem has a well known solution for the common case of linear image filtering-small fixed […]

CUDA

Nov, 22

Dynamic Heterogeneous Scheduling Decisions Using Historical Runtime Data

Heterogeneous systems often employ processing units with a wide spectrum of performance capabilities. Allowing individual applications to make greedy local scheduling decisions leads to imbalance, with underutilization of some devices and excessive contention for others. If we instead allow the system to make global scheduling decisions and assign some applications to a slower device, we […]

OpenCL

Nov, 22

Application of GPGPU for Acceleration of Short DNA Sequence Alignment in Unipro UGENE Project

A dramatic increase of available sequencing datasets has resulted in the need of fast sequence alignment methods. Plenty of novel methods were proposed to perform the fast alignment of NGS data and some of them appeared to be rather effective, however a relatively small number of existing alignment tools use Graphic Processing Units (GPUs) to […]

OpenCL

Nov, 22

Parallelizing Multicore Cache Simulations using Heterogeneous Computing on General Purpose and Graphics Processors

Traditional trace-driven memory system simulation is a very time consuming process while the advent of multi-cores simply exacerbates the problem. We propose a framework for accelerating trace-driven multi-core cache simulations by utilizing the capabilities of the modern many-core Graphic Processing Units (GPUs). A straightforward way towards this direction is to rely on the inherent parallelism […]

CUDA

Nov, 22

GPU-based Multi-start Local Search Algorithms

In practice, combinatorial optimization problems are complex and computationally time-intensive. Local search algorithms are powerful heuristics which allow to significantly reduce the computation time cost of the solution exploration space. In these algorithms, the multi-start model may improve the quality and the robustness of the obtained solutions. However, solving large size and time-intensive optimization problems […]

CUDA

Nov, 22

Using Graphics Processors for a High Performance Normalization of Gene Expressions

Ultra high density oligonucleotide micro arrays allow several millions of genetic markers in a single experiment to be observed. Current bioinformatics software for gene expression quantile data normalization is unable to process such huge datasets. In parallel with this perception, the huge volume of molecular data produced by current high-throughput technologies in modern molecular biology […]

CUDA

Nov, 22

Dataflow-Based Implementation of Layered Sensing Applications

This report describes a new dataflow-based technology and associated design tools for high-productivity design, analysis, and optimization of layered sensing software for signal processing systems. Our approach provides novel capabilities, based on the principles of task-level dataflow analysis, for exploring and optimizing interactions across application behavior; operational context; high performance embedded processing platforms, and implementation […]

CUDA

Nov, 22

Experiences with Achieving Portability across Heterogeneous Architectures

The increasing computational needs of parallel applications inevitably require portability across popular parallel architectures, which are becoming heterogeneous. The lack of a common parallel framework results in divergent code bases, difficulty in porting, higher maintenance cost, and, thus difficulty achieving optimal performance on target architectures. Our paper examines two representative parallel applications and describes code […]

CUDA

Nov, 22

Superconducting proximity effect in graphene under inhomogeneous strain

The interplay between quantum Hall states and Cooper pairs is usually hindered by the suppression of the superconducting state due to the strong magnetic fields needed to observe the quantum Hall effect. From this point of view graphene is special since it allows the creation of strong pseudo-magnetic fields due to strain. We show that […]

Nov, 21

Online Adaptive Code Generation and Tuning

In this paper, we present a runtime compilation and tuning framework for parallel programs. We extend our prior work on our auto-tuner, Active Harmony, for tunable parameters that require code generation (for example, different unroll factors). For such parameters, our auto-tuner generates and compiles new code on-the-fly. Effectively, we merge traditional feedback directed optimization and […]

Nov, 21

Issues in Heterogenenous GPU Clusters

In this paper, we discuss networking issues arising in the design, analysis and use for scientific computing of clusters equipped with graphics processing units. The adoption of graphics accelerators in clusters used for high-performance scientific computing is a fairly recent phenomenon and promises to be an important trend now and into the foreseeable future. After […]

high performance computing on graphics processing units: hgpu.org

Posts

Dynamic adaptation and distribution of binaries to heterogeneous architectures

Efficient Shallow Water Simulations on GPUs

Dynamic Heterogeneous Scheduling Decisions Using Historical Runtime Data

Application of GPGPU for Acceleration of Short DNA Sequence Alignment in Unipro UGENE Project

Parallelizing Multicore Cache Simulations using Heterogeneous Computing on General Purpose and Graphics Processors

GPU-based Multi-start Local Search Algorithms

Using Graphics Processors for a High Performance Normalization of Gene Expressions

Dataflow-Based Implementation of Layered Sensing Applications

Experiences with Achieving Portability across Heterogeneous Architectures

Superconducting proximity effect in graphene under inhomogeneous strain

Online Adaptive Code Generation and Tuning

Issues in Heterogenenous GPU Clusters

Recent source codes

OpScanner

Atlas CLI: Machine Learning (ML) Lifecycle & Transparency Manager

transformers_tvm: Implementation of Encoder Decoder transformer on TVM

INT v.s. FP: A framework to compare low-bit integer and float-point formats

AutoDock-GPU: AutoDock for GPUs and other accelerators

NCCLX: collective communication framework

Tutoring LLM into a Better CUDA Optimizer

Adaptivity in AdaptiveCpp: Optimizing Performance by Leveraging Runtime Information During JIT-Compilation

Kernel Library for LLM Serving

Neptune: Advanced ML Operator Fusion for Locality and Parallelism on GPUs

Most viewed papers (last 30 days)