26805

Posts

May, 22

Blockchain Goes Green? Part II: Characterizing the Performance and Cost of Blockchains on the Cloud and at the Edge

While state-of-the-art permissioned blockchains can achieve thousands of transactions per second on commodity hardware with x86/64 architecture, their performance when running on different architectures is not clear. The goal of this work is to characterize the performance and cost of permissioned blockchains on different hardware systems, which is important as diverse application domains are adopting […]
May, 22

GPU Ray Tracing with Monte Carlo Methods

Monte Carlo methods are various techniques aimed at obtaining numerical results through simulations with random samples: the base idea of Monte Carlo methods is to generate a sequence of random numbers and execute the same algorithm on each one of them or in groups, then the resulting outputs are combined to obtain the final result. […]
May, 22

The Application of AI Technology in GPU Scheduling Algorithm Optimization

With the rapid development of integrated circuit technology, GPU computing capabilities continue to improve. Due to the continuous improvement and improvement of GPU programming capabilities, functions, and performance, GPUs have been widely used in the field of high-tech general-purpose computers. This article is aimed at studying the optimization of GPU scheduling algorithm based on AI […]
May, 22

Real-time semantic segmentation on FPGAs for autonomous vehicles with hls4ml

In this paper, we investigate how field programmable gate arrays can serve as hardware accelerators for real-time semantic segmentation tasks relevant for autonomous driving. Considering compressed versions of the ENet convolutional neural network architecture, we demonstrate a fully-on-chip deployment with a latency of 4.9 ms per image, using less than 30% of the available resources […]
May, 22

AnySeq/GPU: A Novel Approach for Faster Sequence Alignment on GPUs

In recent years, the rapidly increasing number of reads produced by next-generation sequencing (NGS) technologies has driven the demand for efficient implementations of sequence alignments in bioinformatics. However, current state-of-the-art approaches are not able to leverage the massively parallel processing capabilities of modern GPUs with close-to-peak performance. We present AnySeq/GPU-a sequence alignment library that augments […]
May, 15

GPU-based JSON data processing using structural indexes

In recent years, large amounts of data are being increasingly generated and stored every day. Big data is often processed by different software systems, which require a common data interchange format. JavaScript Object Notation, or JSON, is one of the most popular data exchange formats and is widely used in web and data-intensive applications. Unfortunately, […]
May, 15

SYCLops: A SYCL Specific LLVM to MLIR Converter

There is a growing need for higher level abstractions for device kernels in heterogeneous environments, and the multi-level nature of the MLIR infrastructure perfectly addresses this requirement. As SYCL begins to gain industry adoption for heterogeneous applications and MLIR continues to develop, we present SYCLops: a converter capable of translating SYCL specific LLVM IR to […]
May, 15

NEPTUNE: Network- and GPU-aware Management of Serverless Functions at the Edge

Nowadays a wide range of applications is constrained by low-latency requirements that cloud infrastructures cannot meet. Multi-access Edge Computing (MEC) has been proposed as the reference architecture for executing applications closer to users and reduce latency, but new challenges arise: edge nodes are resource-constrained, the workload can vary significantly since users are nomadic, and task […]
May, 15

Can We Run in Parallel? Automating Loop Parallelization for TornadoVM

With the advent of multi-core systems, GPUs and FPGAs, loop parallelization has become a promising way to speed-up program execution. In order to stay up with time, various performance-oriented programming languages provide a multitude of constructs to allow programmers to write parallelizable loops. Correspondingly, researchers have developed techniques to automatically parallelize loops that do not […]
May, 15

Productive Performance Engineering for Weather and Climate Modeling with Python

Earth system models are developed with a tight coupling to target hardware, often containing highly-specialized code predicated on processor characteristics. This coupling stems from using imperative languages that hard-code computation schedules and layout. In this work, we present a detailed account of optimizing the Finite Volume Cubed-Sphere (FV3) weather model, improving productivity and performance. By […]
May, 8

FPGA Acceleration of Structured-Mesh-Based Explicit and Implicit Numerical Solvers using SYCL

We explore the design and development of structured-mesh based solvers on current Intel FPGA hardware using the SYCL programming model. Two classes of applications are targeted : (1) stencil applications based on explicit numerical methods and (2) multidimensional tridiagonal solvers based on implicit methods. Both classes of solvers appear as core modules in a wide-range […]
May, 8

Experience of Migrating a Parallel Graph Coloring Program from CUDA to SYCL

We describe the experience of converting a CUDA implementation of a parallel graph coloring algorithm to SYCL. The goals are for our work to be useful to application and compiler developers by providing a detailed description of migration paths between CUDA and SYCL. We will describe how CUDA functions are mapped to SYCL functions. Evaluating […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: