28467

Posts

Jul, 24

Creating a Dataset Supporting Translation Between OpenMP Fortran and C++ Code

In this study, we present a novel dataset for training machine learning models translating between OpenMP Fortran and C++ code. To ensure reliability and applicability, the dataset is initially refined using a meticulous code similarity test. The effectiveness of our dataset is assessed using both quantitative (CodeBLEU) and qualitative (human evaluation) methods. We demonstrate how […]
Jul, 24

ProtoX: A First Look

We present a first look at ProtoX, a code generation framework for stencil and pointwise operations that occur frequently in the numerical solution of partial differential equations. ProtoX has Proto as its library frontend and SPIRAL as the backend. Proto is a C++ based domain specific library which optimizes the algorithms used to compute the […]
Jul, 24

qecGPT: decoding Quantum Error-correcting Codes with Generative Pre-trained Transformers

We propose a general framework for decoding quantum error-correcting codes with generative modeling. The model utilizes autoregressive neural networks, specifically Transformers, to learn the joint probability of logical operators and syndromes. This training is in an unsupervised way, without the need for labeled training data, and is thus referred to as pre-training. After the pre-training, […]
Jul, 24

Maximizing Parallelism and GPU Utilization For Direct GPU Compilation Through Ensemble Execution

GPUs are renowned for their exceptional computational acceleration capabilities achieved through massive parallelism. However, utilizing GPUs for computation requires manual identification of code regions suitable for offloading, data transfer management, and synchronization. Recent advancements have capitalized on the LLVM/OpenMP portable target offloading interface, elevating GPU acceleration to new heights. This approach, known as the direct […]
Jul, 24

eGPU: A 750 MHz Class Soft GPGPU for FPGA

This paper introduces the eGPU, a SIMT soft processor designed for FPGAs. Soft processors typically achieve modest operating frequencies, a fraction of the headline performance claimed by modern FPGA families, and obtain correspondingly modest performance results. We propose a GPGPU architecture structured specifically to take advantage of both the soft logic and embedded features of […]
Jul, 16

Towards Intelligent Runtime Framework for Distributed Heterogeneous Systems

Scientific applications strive for increased memory and computing performance, requiring massive amounts of data and time to produce results. Applications utilize large-scale, parallel computing platforms with advanced architectures to accommodate their needs. However, developing performance-portable applications for modern, heterogeneous platforms requires lots of effort and expertise in both the application and systems domains. This is […]
Jul, 16

Mystique: Enabling Accurate and Scalable Generation of Production AI Benchmarks

Building large AI fleets to support the rapidly growing DL workloads is an active research topic for modern cloud providers. Generating accurate benchmarks plays an essential role in designing the fast-paced software and hardware solutions in this space. Two fundamental challenges to make this scalable are (i) workload representativeness and (ii) the ability to quickly […]
Jul, 16

Tile-based Lightweight Integer Compression in GPU

GPUs are increasingly used for high-performance and interactive data analytics workloads due to their capability to accelerate computation using massive parallelism. A key constraint of GPU-based data analytics today is the limited memory capacity in GPU devices. Data compression is a powerful technique that can mitigate the capacity limitation in two ways: (1) fitting more […]
Jul, 16

Miriam: Exploiting Elastic Kernels for Real-time Multi-DNN Inference on Edge GPU

Many applications such as autonomous driving and augmented reality, require the concurrent running of multiple deep neural networks (DNN) that poses different levels of real-time performance requirements. However, coordinating multiple DNN tasks with varying levels of criticality on edge GPUs remains an area of limited study. Unlike server-level GPUs, edge GPUs are resource-limited and lack […]
Jul, 16

Improving the Performance, Portability, and Productivity of Hardware Accelerators

With the end of Moore’s Law and Dennard’s scaling, attention is shifting to new ways of enhancing computer performance. Improving microprocessor performance is becoming increasingly complex, whereas computational power demands still grow tremendously fast. In recent years, we are witnessing a paradigm change: rather than using one single chip, the CPU, for computing everything, computers […]
Jul, 9

Safe, Seamless, And Scalable Integration Of Asynchronous GPU Streams In PETSc

Leveraging Graphics Processing Units (GPUs) to accelerate scientific software has proven to be highly successful, but in order to extract more performance, GPU programmers must overcome the high latency costs associated with their use. One method of reducing or hiding this latency cost is to use asynchronous streams to issue commands to the GPU. While […]
Jul, 9

Modeling Parallel Programs using Large Language Models

Parallel software codes in high performance computing (HPC) continue to grow in complexity and scale as we enter the exascale era. A diverse set of emerging hardware and programming paradigms make developing, optimizing, and maintaining parallel software burdensome for developers. One way to alleviate some of these burdens is with automated development and analysis tools. […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: