29135

Posts

Jan, 28

A Heterogeneous Inference Framework for a Deep Neural Network

Artificial intelligence (AI) is one of the most promising technologies based on machine learning algorithms. In this paper, we propose a workflow for the implementation of deep neural networks. This workflow attempts to combine the flexibility of high-level compilers (HLS)-based networks with the architectural control features of hardware description languages (HDL)-based flows. The architecture consists […]
Jan, 14

Preliminary report: Initial evaluation of StdPar implementations on AMD GPUs for HPC

Recently, AMD platforms have not supported offloading C++17 PSTL (StdPar) programs to the GPU. Our previous work highlights how StdPar is able to achieve good performance across NVIDIA and Intel GPU platforms. In that work, we acknowledged AMD’s past effort such as HCC, which unfortunately is deprecated and does not support newer hardware platforms. Recent […]
Jan, 14

Code Generation for a Variety of Accelerators for a Graph DSL

Sparse graphs are ubiquitous in real and virtual worlds. With the phenomenal growth in semi-structured and unstructured data, sizes of the underlying graphs have witnessed a rapid growth over the years. Analyzing such large structures necessitates parallel processing, which is challenged by the intrinsic irregularity of sparse computation, memory access, and communication. It would be […]
Dec, 31

Gaiwan: a Size-Polymorphic Typesystem for GPU Programs

General-purpose computing on graphics processing units (GPGPU) is increasingly used for number crunching tasks such as analyzing time series data. GPUs are a good fit for these tasks as they can execute many computations in parallel. To leverage this parallelism, the programmer is forced to carefully divide their input data into data blocks that are […]
Nov, 19

GPU Auto-tuning Framework for Optimal Performance and Power Consumption

An auto-tuning framework for GPU devices is presented for tuning application kernels of OpenCL. The GPU tuner employs multi-objective optimization methodology to improve the performance and power consumption of applications. It efficiently explores a user defined solution space comprising of possible tunable algorithmic and hardware counter variations through code transformations. The methodology targets GPU code […]
Nov, 12

On the Three P’s of Parallel Programming for Heterogeneous Computing: Performance, Productivity, and Portability

As FPGAs and GPUs continue to make inroads into high-performance computing (HPC), the need for languages and frameworks that offer performance, productivity, and portability across heterogeneous platforms, such as FPGAs and GPUs, continues to grow. OpenCL and SYCL have emerged as frameworks that offer cross-platform functional portability between FPGAs and GPUs. While functional portability across […]
Nov, 12

An Evaluation of Directive-Based Parallelization on the GPU Using a Parboil Benchmark

Heterogeneous architectures consisting of both central processing units and graphics processing units are common in contemporary computer systems. For that reason, several programming models have been developed to exploit available parallelism, such as low-level CUDA and OpenCL, and directive-based OpenMP and OpenACC. In this paper we explore and evaluate the applicability of OpenACC, which is […]
Nov, 5

Applying the Midas Touch of Reproducibility to High-Performance Computing

With the serial performance of CPUs improving exponentially through the 1980s and 1990s and then plateauing by the mid-2000s, the high-performance computing community has seen parallel computing become ubiquitous, which, in turn, has led to a proliferation of parallel programming models. This diversity in hardware platform and programming model has forced programmers to port their […]
Nov, 5

A Comparison of the Performance of the Molecular Dynamics Simulation Package GROMACS Implemented in the SYCL and CUDA Programming Models

For many years, systems running Nvidia-based GPU architectures have dominated the heterogeneous supercomputer landscape. However, recently GPU chipsets manufactured by Intel and AMD have cut into this market and can now be found in some of the world’s fastest supercomputers. The June 2023 edition of the TOP500 list of supercomputers ranks the Frontier supercomputer at […]
Oct, 15

Strega: An HTTP Server for FPGAs

The computer architecture landscape is being reshaped by the new opportunities, challenges and constraints brought by the cloud. On the one hand, high-level applications profit from specialised hardware to boost their performance and reduce deployment costs. On the other hand, cloud providers maximise the CPU time allocated to client applications by offloading infrastructure tasks to […]
Oct, 1

Beehive SPIR-V Toolkit: A Composable and Functional API for Runtime SPIR-V Code Generation

The Standard Portable Intermediate Representation (SPIR-V) is a low-level binary format designed for representing shaders and compute kernels that can be consumed by OpenCL for computing kernels, and Vulkan for graphics rendering. As a binary representation, SPIR-V is meant to be used by compilers and runtime systems, and is usually performed by C/C++ programs and […]
Sep, 24

Evaluating the performance portability of SYCL across CPUs and GPUs on bandwidth-bound applications

In this paper, we evaluate the portability of the SYCL programming model on some of the latest CPUs and GPUs from a wide range of vendors, utilizing the two main compilers: DPC++ and hipSYCL/OpenSYCL. Both compilers currently support GPUs from all three major vendors; we evaluate performance on the Intel(R) Data Center GPU Max 1100, […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: