27019

Posts

Jul, 10

Design and Implementation of CNN-FPGA accelerator based on Open Computing Language

In a wide range of applications, convolutional neural networks (CNNs) have been widely used, including face and speech recognition, picture retrieval and classification, and automated driving. As a result, CNN accelerators have become a popular topic of discourse. CNN Accelerators Graphics processing units (GPU) are often employed in CNN accelerators, and they are referred to […]
Jun, 19

Securing GPU via Region-based Bounds Checking

Graphics processing units (GPUs) have become essential general-purpose computing platforms to accelerate a wide range of workloads, such as deep learning, scientific, and high-performance computing (HPC) applications. However, recent memory corruption attacks, such as buffer overflow, exposed security vulnerabilities in GPUs. We demonstrate that out-of-bounds writes are reproducible on an Nvidia GPU, which can enable […]
May, 22

GPU Ray Tracing with Monte Carlo Methods

Monte Carlo methods are various techniques aimed at obtaining numerical results through simulations with random samples: the base idea of Monte Carlo methods is to generate a sequence of random numbers and execute the same algorithm on each one of them or in groups, then the resulting outputs are combined to obtain the final result. […]
May, 8

Experience of Migrating a Parallel Graph Coloring Program from CUDA to SYCL

We describe the experience of converting a CUDA implementation of a parallel graph coloring algorithm to SYCL. The goals are for our work to be useful to application and compiler developers by providing a detailed description of migration paths between CUDA and SYCL. We will describe how CUDA functions are mapped to SYCL functions. Evaluating […]
May, 1

Improving performance of SYCL applications on CPU architectures using LLVM-directed compilation flow

The wide adoption of SYCL as an open-standard API for accelerating C++ software in domains such as HPC, Automotive, Artificial Intelligence, Machine Learning, and other areas necessitates efficient compiler and runtime support for a growing number of different platforms. Existing SYCL implementations provide support for various devices like CPUs, GPUs, DSPs, FPGAs, etc, typically via […]
Apr, 17

Performance study on GPU offloading techniques using the Gauss matrix inverse algorithm

Inverting matrices is a crucial part in many algorithms in linear algebra, computer graphics and data analysis. There are many libraries providing algorithms to achieve this but none that allow for calling from the GPU context. GPUs and accelerators become more and more prevalent in high performance computers. Having no ready-to-use implementation scientists need to […]
Apr, 10

Optimizing Performance and Energy Efficiency in Massively Parallel Systems

Heterogeneous systems are becoming increasingly relevant, due to their performance and energy efficiency capabilities, being present in all types of computing platforms, from embedded devices and servers to HPC nodes in large data centers. Their complexity implies that they are usually used under the task paradigm and the host-device programming model. This strongly penalizes accelerator […]
Mar, 6

Integrating SkePU’s algorithmic skeletons with GPI on a cluster

As processors’ clock-speed flattened out in the early 2000s, multi-core processors became more prevalent and so did parallel programming. However this programming paradigm introduces additional complexities, and to combat this, the SkePU framework was created. SkePU does this by offering a single-threaded interface which executes the user’s code in parallel in accordance to a chosen […]
Mar, 6

Enabling On-Device Smartphone GPU based Training: Lessons Learned

Deep Learning (DL) has shown impressive performance in many mobile applications. Most existing works have focused on reducing the computational and resource overheads of running Deep Neural Networks (DNN) inference on resource-constrained mobile devices. However, the other aspect of DNN operations, i.e. training (forward and backward passes) on smartphone GPUs, has received little attention thus […]
Jan, 23

A tool set for random number generation on GPUs in R

We introduce the R package clrng which leverages the gpuR package and is able to generate random numbers in parallel on a Graphics Processing Unit (GPU) with the clRNG (OpenCL) library. Parallel processing with GPU’s can speed up computationally intensive tasks, which when combined with R, it can largely improve R’s downsides in terms of […]
Jan, 23

Multi-hetero Acceleration by GPU and FPGA for Astrophysics Simulation on oneAPI Environment

GPU (Graphics Processing Unit) computing is one of the most popular accelerating methods for various high-performance computing applications. For scientific computations based on multi-physical phenomena, however, a single device solution on a GPU is insufficient, where the single timescale or degree of parallelism is not simply supported by a simple GPU-only solution. We have been […]
Jan, 16

Dopia: Online Parallelism Management for Integrated CPU/GPU Architectures

Recent desktop and mobile processors often integrate CPU and GPU onto the same die. The limited memory bandwidth of these integrated architectures can negatively affect the performance of data-parallel workloads when all computational resources are active. The combination of active CPU and GPU cores achieving the maximum performance depends on a workload’s characteristics, making manual […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: