28897

Posts

Dec, 24

FCBench: Cross-Domain Benchmarking of Lossless Compression for Floating-Point Data

While both the database and high-performance computing (HPC) communities utilize lossless compression methods to minimize floating-point data size, a disconnect persists between them. Each community designs and assesses methods in a domain-specific manner, making it unclear if HPC compression techniques can benefit database applications or vice versa. With the HPC community increasingly leaning towards in-situ […]
Dec, 24

Experiences Building an MLIR-based SYCL Compiler

Similar to other programming models, compilers for SYCL, the open programming model for heterogeneous computing based on C++, would benefit from access to higher-level intermediate representations. The loss of high-level structure and semantics caused by premature lowering to low-level intermediate representations and the inability to reason about host and device code simultaneously present major challenges […]
Dec, 24

Binary Code Summarization: Benchmarking ChatGPT/GPT-4 and Other Large Language Models

Binary code summarization, while invaluable for understanding code semantics, is challenging due to its labor-intensive nature. This study delves into the potential of large language models (LLMs) for binary code comprehension. To this end, we present BinSum, a comprehensive benchmark and dataset of over 557K binary functions and introduce a novel method for prompt synthesis […]
Dec, 24

Comparative Performance and Scalability Analysis of GPU-accelerated Database Operations

This Master’s thesis investigates the performance dynamics of database operations – V-Search, Fuzzy Search, and Join – implemented on both Central Processing Units (CPU) and Graphics Processing Units (GPU). With the ever-increasing demand for efficient data processing, it has become crucial to understand and optimize the use of different hardware platforms for executing diverse database […]
Dec, 18

Scalable Tuning of (OpenMP) GPU Applications via Kernel Record and Replay

HPC is a heterogeneous world in which host and device code are interleaved throughout the application. Given the significant performance advantage of accelerators, device code execution time is becoming the new bottleneck. Tuning the accelerated parts is consequently highly desirable but often impractical due to the large overall application runtime which includes unrelated host parts. […]
Dec, 18

Precision and Performance Analysis of C Standard Math Library Functions on GPUs

With the advent of GPU computing, executing large program sections on accelerators has become increasingly important. Efforts are being made to support the C standard library, LIBC, on GPUs via LLVM machinery. Therefore, the C standard math library, LIBM, must be supported on GPUs. So far, LLVM frontends, such as Clang, have relied on GPU […]
Dec, 18

Application Performance Profiling on Intel GPUs with Oneprof and Onetrace

Modern supercomputing applications are complex programs built on optimized frameworks and accelerated on GPUs. As such, dedicated tools for profiling GPU kernel utilization and performance are needed to support development of these applications, which in turn accelerates progress for the scientific computing and machine learning communities. This paper presents the Oneprof and Onetrace tools from […]
Dec, 18

Principles for Automated and Reproducible Benchmarking

The diversity in processor technology used by High Performance Computing (HPC) facilities is growing, and so applications must be written in such a way that they can attain high levels of performance across a range of different CPUs, GPUs, and other accelerators. Measuring application performance across this wide range of platforms becomes crucial, but there […]
Dec, 18

cuSZ-I: High-Fidelity Error-Bounded Lossy Compression for Scientific Data on GPUs

Error-bounded lossy compression is a critical technique for significantly reducing scientific data volumes. Compared to CPU-based scientific compressors, GPU-accelerated compressors exhibit substantially higher throughputs, which can thus better adapt to GPU-based scientific simulation applications. However, a critical limitation still lies in all existing GPU-accelerated error-bounded lossy compressors: they suffer from low compression ratios, which strictly […]
Dec, 10

Understanding the Topics and Challenges of GPU Programming by Classifying and Analyzing Stack Overflow Posts

GPUs have cemented their position in computer systems, not restricted to graphics but also extensively used for general-purpose computing. With this comes a rapidly expanding population of developers using GPUs for programming. However, programming with GPUs is notoriously difficult due to their unique architecture and constant evolution. A large number of developers have encountered problems […]
Dec, 10

Efficiently Processing Large Relational Joins on GPUs

With the growing interest in Machine Learning (ML), Graphic Processing Units (GPUs) have become key elements of any computing infrastructure. Their widespread deployment in data centers and the cloud raises the question of how to use them beyond ML use cases, with growing interest in employing them in a database context. In this paper, we […]
Dec, 10

GenVectorX: A performance-portable SYCL library for Lorentz Vectors operations

The Large Hadron Collider (LHC) at CERN will see an upgraded hardware configuration which will bring a new era of physics data taking and related computational challenges. To this end, it is necessary to exploit the ever increasing variety of computational architectures, featuring GPUs from multiple vendors and new accelerators. Performance portable frameworks, like SYCL, […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: