18395

Posts

Aug, 5

OpenCLIPER: an OpenCL-based C++ Framework for Overhead-Reduced Medical Image Processing and Reconstruction on Heterogeneous Devices

Medical image processing is often limited by the computational cost of the involved algorithms. Whereas dedicated computing devices (GPUs in particular) exist and do provide significant efficiency boosts, they have an extra cost of use in terms of housekeeping tasks (device selection and initialization, data streaming, synchronization with the CPU and others), which may hinder […]
Aug, 5

CRUM: Checkpoint-Restart Support for CUDA’s Unified Memory

Unified Virtual Memory (UVM) was recently introduced on recent NVIDIA GPUs. Through software and hardware support, UVM provides a coherent shared memory across the entire heterogeneous node, migrating data as appropriate. The older CUDA programming style is akin to older large-memory UNIX applications which used to directly load and unload memory segments. Newer CUDA programs […]
Jul, 28

Elementary functions: towards automatically generated, efficient, and vectorizable implementations

Elementary mathematical functions are pervasive in many high performance computing programs. However, although the mathematical libraries (libms), on which these programs rely, generally provide several flavors of the same function, these are fixed at implementation time. Hence this monolithic characteristic of libms is an obstacle for the performance of programs relying on them, because they […]
Jul, 28

Optimization of OpenCL applications on FPGA

Since Moore’s Law is over, specialized accelerators have becoming more and more trending over the years. FPGA is one of this accelerators and their "reconfigurable hardware" capabilities make it really promising. FPGA are programmed with HDL languages which is hard and time-consuming so many high-level alternatives (such HLS, OpenCL, SystemC, …) have emerged to provide […]
Jul, 28

Smoothed-Particle Hydrodynamics Models: Implementation Features on GPUs

Parallel implementation features of self-gravitating gas dynamics modeling on multiple GPUs are considered applying the GPU-Direct technology. The parallel algorithm for solving of the self-gravitating gas dynamics problem based on hybrid OpenMP-CUDA parallel programming model has been described in detail. The gas-dynamic forces are calculated by the modified SPH-method (Smoothed Particle Hydrodynamics) while the N-body […]
Jul, 28

gSMat: A Scalable Sparse Matrix-based Join for SPARQL Query Processing

Resource Description Framework (RDF) has been widely used to represent information on the web, while SPARQL is a standard query language to manipulate RDF data. Given a SPARQL query, there often exist many joins which are the bottlenecks of efficiency of query processing. Besides, the real RDF datasets often reveal strong data sparsity, which indicates […]
Jul, 28

Block-Size Independence for GPU Programs

Optimizing GPU programs by tuning execution parameters is essential to realizing the full performance potential of GPU hardware. However, many of these optimizations do not ensure correctness and subtle errors can enter while optimizing a GPU program. Further, lack of formal models and the presence of non-trivial transformations prevent verification of optimizations. In this work, […]
Jul, 21

Spatial: A Language and Compiler for Application Accelerators

Industry is increasingly turning to reconfigurable architectures like FPGAs and CGRAs for improved performance and energy efficiency. Unfortunately, adoption of these architectures has been limited by their programming models. HDLs lack abstractions for productivity and are difficult to target from higher level languages. HLS tools are more productive, but offer an ad-hoc mix of software […]
Jul, 21

Abelian: A Compiler for Graph Analytics on Distributed, Heterogeneous Platforms

The trend towards processor heterogeneity and distributed-memory has significantly increased the complexity of parallel programming. In addition, the mix of applications that need to run on parallel platforms today is very diverse, and includes graph applications that typically have irregular memory accesses and unpredictable control-flow. To simplify the programming of graph applications on such platforms, […]
Jul, 21

cuPentBatch – A batched pentadiagonal solver for NVIDIA GPUs

We introduce cuPentBatch – our own pentadiagonal solver for NVIDIA GPUs. The development of cuPentBatch has been motivated by applications involving numerical solutions of parabolic partial differential equations, which we describe. Our solver is written with batch processing in mind (as necessitated by parameter studies of various physical models). In particular, our solver is directed […]
Jul, 21

ARC: Adaptive Ray-tracing with CUDA, a New Ray Tracing Code for Parallel GPUs

We present the methodology of a photon-conserving, spatially-adaptive, ray-tracing radiative transfer algorithm, designed to run on multiple parallel Graphic Processing Units (GPUs). Each GPU has thousands computing cores, making them ideally suited to the task of tracing independent rays. This ray-tracing implementation has speed competitive with approximate momentum methods, even with thousands of ionization sources, […]
Jul, 21

LeFlow: Enabling Flexible FPGA High-Level Synthesis of Tensorflow Deep Neural Networks

Recent work has shown that Field-Programmable Gate Arrays (FPGAs) play an important role in the acceleration of Machine Learning applications. Initial specification of machine learning applications are often done using a high-level Python-oriented framework such as Tensorflow, followed by a manual translation to either C or RTL for synthesis using vendor tools. This manual translation […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: