19043

Posts

Jul, 14

On the Portability of CPU-Accelerated Applications via Automated Source-to-Source Translation

Over the past decade, accelerator-based supercomputers have grown from 0% to 42% performance share on the TOP500. Ideally, GPUaccelerated code on such systems should be "write once, run anywhere," regardless of the GPU device (or for that matter, any parallel device, e.g., CPU or FPGA). In practice, however, portability can be significantly more limited due […]
Jul, 10

GPU-based Parallel Computation Support for Stan

This paper details an extensible OpenCL framework that allows Stan to utilize heterogeneous compute devices. It includes GPU-optimized routines for the Cholesky decomposition, its derivative, other matrix algebra primitives and some commonly used likelihoods, with more additions planned for the near future. Stan users can now benefit from speedups offered by GPUs with little effort […]
Jun, 27

ReSYCLator: Transforming CUDA C++ source code into SYCL

CUDA while very popular, is not as flexible with respect to target devices as OpenCL. While parallel algorithm research might address problems first with a CUDA C++ solution, those results are not easily portable to a target not directly supported by CUDA. In contrast, a SYCL C++ solution can operate on the larger variety of […]
Jun, 9

Temporospatial Epidemic Simulations Using Heterogeneous Computing

Discrete Event Simulation (DES) is widely used for analysis of complex temporospatial epidemic models. In such simulations, a conspicuous fraction (50%-90%) of simulation runtime is typically spent in solving equations used to model epidemic progression. General Purpose Graphics Processing Units (GPGPUs) hold considerable potential to reduce time for solving epidemic equations. However, the significant differences […]
Jun, 2

Heterogeneous Resource-Elastic Scheduling for CPU+FPGA Architectures

Heterogeneous computing is a key strategy to meet the requirements of many compute-intensive applications. However, currently, CPU+FPGA platforms are commonly underutilized as scheduling is often constrained to a run-tocompletion model or acceleration of a single application at a time. To tackle this, this paper proposes heterogeneous resource-elastic scheduling for maximizing the utilization of both CPU […]
May, 26

A Hybrid Framework for Fast and Accurate GPU Performance Estimation through Source-Level Analysis and Trace-Based Simulation

This paper proposes a hybrid framework for fast and accurate performance estimation of OpenCL kernels running on GPUs. The kernel execution flow is statically analyzed and thereupon the execution trace is generated via a loop-based bidirectional branch search. Then the trace is dynamically simulated to perform a dummy execution of the kernel to obtain the […]
May, 19

Automatic Virtualization of Accelerators

Applications are migrating en masse to the cloud, while accelerators such as GPUs, TPUs, and FPGAs proliferate in the wake of Moore’s Law. These technological trends are incompatible. Cloud applications run on virtual platforms, but traditional I/O virtualization techniques have not provided production-ready solutions for accelerators. As a result, cloud providers expose accelerators by using […]
May, 8

FPGA-based acceleration of a particle simulation High Performance Computing application

In the present thesis, it has been studied the possibility to insert FPGAs in the world of High Performance Computing (HPC) systems. Such systems are hybrid platforms that exploit the pure parallel computation of GPUs in order to reach very high performances. Nevertheless, GPU-based systems are power-hungry and require a power consumption so large, that […]
Mar, 17

Novel Data-Partitioning Algorithms for Performance and Energy Optimization of Data-Parallel Applications on Modern Heterogeneous HPC Platforms

Heterogeneity has turned into one of the most profound and challenging characteristics of today’s HPC environments. Modern HPC platforms have become highly heterogeneous owing to the tight integration of multicore CPUs and accelerators (such as Graphics Processing Units, Intel Xeon Phis, or Field-Programmable Gate Arrays) empowering them to maximize the dominant objectives of performance and […]
Mar, 17

CLTestCheck: Measuring Test Effectiveness for GPU Kernels

Massive parallelism, and energy efficiency of GPUs, along with advances in their programmability with OpenCL and CUDA programming models have made them attractive for general-purpose computations across many application domains. Techniques for testing GPU kernels have emerged recently to aid the construction of correct GPU software. However, there exists no means of measuring quality and […]
Mar, 17

Performance Optimization of Memory Intensive Applications on FPGA Accelerator

Hardware accelerators are a fundamental part of modern high performance computing (HPC) systems due to their performance capabilities. The two most commonly used accelerators are GPUs and FPGAs. Despite the easier programmability and better memory performance of GPUs, generally FPGAs perform equally well for computationally challenging applications while dramatically reducing the energy consumption. Furthermore, with […]
Mar, 10

Energy Efficient Parallel K-Means Clustering for an Intel Hybrid Multi-Chip Package

FPGA devices have been proving to be good candidates to accelerate applications from different research topics. For instance, machine learning applications such as K-Means clustering usually relies on large amount of data to be processed, and, despite the performance offered by other architectures, FPGAs can offer better energy efficiency. With that in mind, Intel ® […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: