high performance computing on graphics processing units: hgpu.org

Posts

Jul, 15

Concurrent Manipulation of Dynamic Data Structures in OpenCL

With the emergence of general purpose GPU (GPGPU) programming, concurrent data processing of large arrays of data has gained a significant boost in performance. However, due to the memory architecture between the host and GPU device and other limitations in the instructions available on GPUs, the implementation of dynamic data structures, like linked list and […]

OpenCL

Jul, 15

Recovering Historical Climate Records using Artificial Neural Networks in GPU

This article presents a parallel implementation of Artificial Neural Networks over Graphic Processing Units, and its application for recovering historical climate records from the Digi-Clima project. Several strategies are introduced to handle large volumes of historical pluviometer records, and the parallel deployment is described. The experimental evaluation demonstrates that the proposed approach is useful for […]

CUDA

Jul, 15

Adaptive parallelism mapping in dynamic environments using machine learning

Modern day hardware platforms are parallel and diverse, ranging from mobiles to data centers. Mainstream parallel applications execute in the same system competing for resources. This resource contention may lead to a drastic degradation in a program’s performance. In addition, the execution environment composed of workloads and hardware resources, is dynamic and unpredictable. Efficient matching […]

Jul, 15

6th International Conference on Applied Physics and Mathematics (ICAPM 2016), 2016

Submission Methods Please log in Electronic Submission System (.pdf). http://www.easychair.org/conferences/?conf=icapm2016 Paper Publication: Paper accepted by ICAPM 2016 will be published in one of the following publications after review process. * International Journal of Engineering and Technology (IJET, ISSN: 1793-8236) Indexing: Chemical Abstracts Services (CAS), DOAJ, Engineering & Technology Digital Library, Google Scholar, Ulrich Periodicals Directory, […]

Jul, 13

GPU-accelerated discontinuous Galerkin methods on hybrid meshes

We present a time-explicit discontinuous Galerkin (DG) solver for the time-domain acoustic wave equation on hybrid meshes containing vertex-mapped hexahedral, wedge, pyramidal and tetrahedral elements. Discretely energy-stable formulations are presented for both Gauss-Legendre and Gauss-Legendre-Lobatto (Spectral Element) nodal bases for the hexahedron. Stable timestep restrictions for hybrid meshes are derived by bounding the spectral radius […]

CUDA

•

OpenCL

Jul, 13

A Study of Data Partitioning on OpenCL-based FPGAs

A lot of research efforts have been devoted to accelerating relational database applications on FPGAs, due to their high energy efficiency and high throughput. Most of the existing studies are based on hardware description languages (HDLs). Recently, FPGA vendors have started to develop OpenCL SDKs for much better programmability. In this paper, we investigate the […]

OpenCL

Jul, 13

Evaluating the capabilities of the Xeon Phi platform in the context of software-only, thread-level speculation

Intel Xeon Phi accelerators are one of the newest devices used in the field of parallel computing. However, there are comparatively few studies concerning their performance when using most of the existing parallelization techniques. One of them is thread-level speculation, a technique that optimistically tries to extract parallelism of loops without the need of a […]

Jul, 13

Optimization, Specification and Verification of the Prefix Sum Program in an OpenCL Environment

The Prefix Sum is an algorithm used as a building block for various other algorithms, for example radix sort, quicksort and lexically comparing strings. Implementing the Prefix Sum algorithm on the CPU is trivial, but a parallel approach with OpenCL is more complicated. An implementation in OpenCL has been made, and optimized to minimize branch […]

OpenCL

Jul, 13

PLB-HeC: A Profile-based Load-Balancing algorithm for Heterogeneous CPU-GPU Clusters

The use of GPU clusters for scientific applications in areas such as physics, chemistry and bioinformatics is becoming more widespread. These clusters frequently have different types of processing devices, such as CPUs and GPUs, which can themselves be heterogeneous. To use these devices in an efficient manner, it is crucial to find the right amount […]

CUDA

Jul, 10

Characterizing and Optimizing Irregular Applications on Graphics Processing Units

In recent years, GPGPUs have experienced tremendous growth as general-purpose and high-throughput computing devices. Applications from various domains achieve significant speedups using GPGPUs. However, irregular applications do not perform well due to the mismatches between irregular problem structures and SIMD-like GPU architectures. The lack of in-depth characterization and quantifying the ways in which irregular applications […]

CUDA

Jul, 10

Contributions to Music Semantic Analysis and Its Acceleration Techniques

Digitalized music production exploded in the past decade. Huge amount of data drives the development of effective and efficient methods for automatic music analysis and retrieval. This thesis focuses on performing semantic analysis of music, in particular mood and genre classification, with low level and mid level features since the mood and genre are among […]

Jul, 10

Many-Core Compiler Fuzzing

We address the compiler correctness problem for many-core systems through novel applications of fuzz testing to OpenCL compilers. Focusing on two methods from prior work, random differential testing and testing via equivalence modulo inputs (EMI), we present several strategies for random generation of deterministic, communicating OpenCL kernels, and an injection mechanism that allows EMI testing […]

OpenCL

high performance computing on graphics processing units: hgpu.org

Posts

Concurrent Manipulation of Dynamic Data Structures in OpenCL

Recovering Historical Climate Records using Artificial Neural Networks in GPU

Adaptive parallelism mapping in dynamic environments using machine learning

6th International Conference on Applied Physics and Mathematics (ICAPM 2016), 2016

GPU-accelerated discontinuous Galerkin methods on hybrid meshes

A Study of Data Partitioning on OpenCL-based FPGAs

Evaluating the capabilities of the Xeon Phi platform in the context of software-only, thread-level speculation

Optimization, Specification and Verification of the Prefix Sum Program in an OpenCL Environment

PLB-HeC: A Profile-based Load-Balancing algorithm for Heterogeneous CPU-GPU Clusters

Characterizing and Optimizing Irregular Applications on Graphics Processing Units

Contributions to Music Semantic Analysis and Its Acceleration Techniques

Many-Core Compiler Fuzzing

Recent source codes

OpScanner

Atlas CLI: Machine Learning (ML) Lifecycle & Transparency Manager

transformers_tvm: Implementation of Encoder Decoder transformer on TVM

INT v.s. FP: A framework to compare low-bit integer and float-point formats

AutoDock-GPU: AutoDock for GPUs and other accelerators

NCCLX: collective communication framework

Tutoring LLM into a Better CUDA Optimizer

Adaptivity in AdaptiveCpp: Optimizing Performance by Leveraging Runtime Information During JIT-Compilation

Kernel Library for LLM Serving

Neptune: Advanced ML Operator Fusion for Locality and Parallelism on GPUs

Most viewed papers (last 30 days)