Posts
Dec, 18
Application Performance Profiling on Intel GPUs with Oneprof and Onetrace
Modern supercomputing applications are complex programs built on optimized frameworks and accelerated on GPUs. As such, dedicated tools for profiling GPU kernel utilization and performance are needed to support development of these applications, which in turn accelerates progress for the scientific computing and machine learning communities. This paper presents the Oneprof and Onetrace tools from […]
Dec, 18
Principles for Automated and Reproducible Benchmarking
The diversity in processor technology used by High Performance Computing (HPC) facilities is growing, and so applications must be written in such a way that they can attain high levels of performance across a range of different CPUs, GPUs, and other accelerators. Measuring application performance across this wide range of platforms becomes crucial, but there […]
Dec, 18
cuSZ-I: High-Fidelity Error-Bounded Lossy Compression for Scientific Data on GPUs
Error-bounded lossy compression is a critical technique for significantly reducing scientific data volumes. Compared to CPU-based scientific compressors, GPU-accelerated compressors exhibit substantially higher throughputs, which can thus better adapt to GPU-based scientific simulation applications. However, a critical limitation still lies in all existing GPU-accelerated error-bounded lossy compressors: they suffer from low compression ratios, which strictly […]
Dec, 10
Edge AI for Internet of Energy: Challenges and Perspectives
The digital landscape of the Internet of Energy (IoE) is on the brink of a revolutionary transformation with the integration of edge Artificial Intelligence (AI). This comprehensive review elucidates the promise and potential that edge AI holds for reshaping the IoE ecosystem. Commencing with a meticulously curated research methodology, the article delves into the myriad […]
Dec, 10
Compiler-centric across-stack deep learning acceleration
Optimizing the deployment of Deep Neural Networks (DNNs) is hard. Despite deep learning approaches increasingly providing state-of-the-art solutions to a variety of difficult problems, such as computer vision and natural language processing, DNNs can be prohibitively expensive, for example, in terms of inference time or memory usage. Effective exploration of the design space requires a […]
Dec, 10
Understanding the Topics and Challenges of GPU Programming by Classifying and Analyzing Stack Overflow Posts
GPUs have cemented their position in computer systems, not restricted to graphics but also extensively used for general-purpose computing. With this comes a rapidly expanding population of developers using GPUs for programming. However, programming with GPUs is notoriously difficult due to their unique architecture and constant evolution. A large number of developers have encountered problems […]
Dec, 10
Efficiently Processing Large Relational Joins on GPUs
With the growing interest in Machine Learning (ML), Graphic Processing Units (GPUs) have become key elements of any computing infrastructure. Their widespread deployment in data centers and the cloud raises the question of how to use them beyond ML use cases, with growing interest in employing them in a database context. In this paper, we […]
Dec, 10
GenVectorX: A performance-portable SYCL library for Lorentz Vectors operations
The Large Hadron Collider (LHC) at CERN will see an upgraded hardware configuration which will bring a new era of physics data taking and related computational challenges. To this end, it is necessary to exploit the ever increasing variety of computational architectures, featuring GPUs from multiple vendors and new accelerators. Performance portable frameworks, like SYCL, […]
Dec, 3
A Survey on Design Methodologies for Accelerating Deep Learning on Heterogeneous Architectures
In recent years, the field of Deep Learning has seen many disruptive and impactful advancements. Given the increasing complexity of deep neural networks, the need for efficient hardware accelerators has become more and more pressing to design heterogeneous HPC platforms. The design of Deep Learning accelerators requires a multidisciplinary approach, combining expertise from several areas, […]
Dec, 3
Testing and Mutation Testing for GPU Kernels
The increasing GPU performance and maturing computational platform make it possible to handle general-purpose computing jobs traditionally computed by the CPU. Also, just like what we did in the CPU program, we use testing to verify the correctness of the GPU program. However, the quality of the tests may remain unknown, which inspires us to […]
Dec, 3
A Review of the Parallelization Strategies for Iterative Algorithms
Iteration-based algorithms have been widely used and achieved excellent results in many fields. However, in the big data era, data that needs to be processed is enormous in terms of both depth (the dimensionality of data) and breadth (the volume of data). Due to the slowdown of Moore’s Law, the computing power of single-core CPUs […]
Dec, 3
CuPBoP-AMD: Extending CUDA to AMD Platforms
The proliferation of artificial intelligence applications has underscored the need for increased portability among graphic processing units (GPUs) from different vendors. With CUDA as one of the most popular GPU programming languages, CuPBoP (CUDA for Parallelized and Broad-range Processors) aims to provide NVIDIA’s proprietary CUDA language support to a variety of GPU and CPU platforms […]