Posts
Jan, 7
Domain-Specific Code Language Models: Unraveling the Potential for HPC Codes and Tasks
With easier access to powerful compute resources, there is a growing trend in AI for software development to develop larger language models (LLMs) to address a variety of programming tasks. Even LLMs applied to tasks from the high-performance computing (HPC) domain are huge in size and demand expensive compute resources for training. This is partly […]
Dec, 31
Adding fault tolerance to OpenCL: Through redundant heterogeneous computing
The ever-increasing demand for computing has led to the need for specialized heterogeneous hardware, and the frameworks required to utilize them. Besides the traditional central processing units, more and more programs will make use of specialized hardware to accelerate computations. However, the increase in computing also leads to shorter mean time between failures. In this […]
Dec, 31
Optimization of Ported CFD Kernels on Intel Data Center GPU Max 1550 using oneAPI ESIMD
We describe our experience porting FUN3D’s CUDA-optimized kernels to Intel oneAPI SYCL. We faced several challenges, including foremost the suboptimal performance of the oneAPI code on Intel’s new data center GPU. Suboptimal performance of the oneAPI code was due primarily to high register spills, memory latency, and poor vectorization. We addressed these issues by implementing […]
Dec, 31
Performance Evaluation of Heterogeneous GPU Programming Frameworks for Hemodynamic Simulations
Preparing for the deployment of large scientific and engineering codes on upcoming exascale systems with GPU-dense nodes is made challenging by the unprecedented diversity of device architectures and heterogeneous programming models. In this work, we evaluate the process of porting a massively parallel, fluid dynamics code written in CUDA to SYCL, HIP, and Kokkos with […]
Dec, 31
Enabling Quantum Computer Simulations on AMD GPUs: a HIP Backend for Google’s qsim
Quantum computer simulators play a critical role in supporting the development and validation of quantum algorithms and hardware. This study focuses on porting Google’s qsim, a quantum computer simulator, to AMD Graphics Processing Units (GPUs). We leverage the existing qsim CUDA backend and harness the HIPIFY tool to provide a qsim HIP backend tailored for […]
Dec, 31
Gaiwan: a Size-Polymorphic Typesystem for GPU Programs
General-purpose computing on graphics processing units (GPGPU) is increasingly used for number crunching tasks such as analyzing time series data. GPUs are a good fit for these tasks as they can execute many computations in parallel. To leverage this parallelism, the programmer is forced to carefully divide their input data into data blocks that are […]
Dec, 24
KeSCo: Compiler-based Kernel Scheduling for Multi-task GPU Applications
Nowadays, Graphics Processing Units (GPUs) dominate in a wide spectrum of computing realms and multi-task is increasingly applied in various complicated applications. To gain higher performance, multi-task programs require cumbersome programming efforts to take advantage of inter-kernel concurrency at source-code level. Although there exist works automatically scheduling kernels to enable inter-kernel concurrency, they all inevitably […]
Dec, 24
FCBench: Cross-Domain Benchmarking of Lossless Compression for Floating-Point Data
While both the database and high-performance computing (HPC) communities utilize lossless compression methods to minimize floating-point data size, a disconnect persists between them. Each community designs and assesses methods in a domain-specific manner, making it unclear if HPC compression techniques can benefit database applications or vice versa. With the HPC community increasingly leaning towards in-situ […]
Dec, 24
Experiences Building an MLIR-based SYCL Compiler
Similar to other programming models, compilers for SYCL, the open programming model for heterogeneous computing based on C++, would benefit from access to higher-level intermediate representations. The loss of high-level structure and semantics caused by premature lowering to low-level intermediate representations and the inability to reason about host and device code simultaneously present major challenges […]
Dec, 24
Binary Code Summarization: Benchmarking ChatGPT/GPT-4 and Other Large Language Models
Binary code summarization, while invaluable for understanding code semantics, is challenging due to its labor-intensive nature. This study delves into the potential of large language models (LLMs) for binary code comprehension. To this end, we present BinSum, a comprehensive benchmark and dataset of over 557K binary functions and introduce a novel method for prompt synthesis […]
Dec, 24
Comparative Performance and Scalability Analysis of GPU-accelerated Database Operations
This Master’s thesis investigates the performance dynamics of database operations – V-Search, Fuzzy Search, and Join – implemented on both Central Processing Units (CPU) and Graphics Processing Units (GPU). With the ever-increasing demand for efficient data processing, it has become crucial to understand and optimize the use of different hardware platforms for executing diverse database […]
Dec, 18
Scalable Tuning of (OpenMP) GPU Applications via Kernel Record and Replay
HPC is a heterogeneous world in which host and device code are interleaved throughout the application. Given the significant performance advantage of accelerators, device code execution time is becoming the new bottleneck. Tuning the accelerated parts is consequently highly desirable but often impractical due to the large overall application runtime which includes unrelated host parts. […]