Posts
Nov, 27
A Hybrid Multi-GPU Implementation of Simplex Algorithm with CPU Collaboration
The simplex algorithm has been successfully used for many years in solving linear programming (LP) problems. Due to the intensive computations required (especially for the solution of large LP problems), parallel approaches have also extensively been studied. The computational power provided by the modern GPUs as well as the rapid development of multicore CPU systems […]
Nov, 27
Assessing Opportunities of SYCL and Intel oneAPI for Biological Sequence Alignment
Background and objectives. The computational biology area is growing up over the years. The interest in researching and developing computational tools for the acquisition, storage, organization, analysis, and visualization of biological data generates the need to create new hardware architectures and new software tools that allow processing big data in acceptable times. In this sense, […]
Nov, 20
Training a Vision Transformer from scratch in less than 24 hours with 1 GPU
Transformers have become central to recent advances in computer vision. However, training a vision Transformer (ViT) model from scratch can be resource intensive and time consuming. In this paper, we aim to explore approaches to reduce the training costs of ViT models. We introduce some algorithmic improvements to enable training a ViT model from scratch […]
Nov, 20
Hardware Checkpointing and Productive Debugging Flows for FPGAs
As FPGAs become larger and more complex, productive debugging is becoming more challenging. In this work, we detail a new debugging flow based on hardware checkpointing that provides full visibility and controllability while maintaining reasonable execution speed. Hardware checkpointing is useful not only for debugging but also enables several other capabilities such as live migration, […]
Nov, 20
Challenges and Techniques for Transparent Acceleration of Unmodified Big Data Applications
The ever-increasing demand for high-performance Big Data analytics and data processing has paved the way for heterogeneous hardware accelerators, such as Graphics Processing Units (GPUs) and Field Programmable Gate Arrays (FPGAs), to be integrated into modern Big Data platforms. Currently, this integration comes at the cost of programmability, as the end-user Application Programming Interface (API) […]
Nov, 20
Going green: optimizing GPUs for energy efficiency through model-steered auto-tuning
Graphics Processing Units (GPUs) have revolutionized the computing landscape over the past decade. However, the growing energy demands of data centres and computing facilities equipped with GPUs come with significant capital and environmental costs. The energy consumption of GPU applications greatly depend on how well they are optimized. Auto-tuning is an effective and commonly applied […]
Nov, 20
TorchOpt: An Efficient Library for Differentiable Optimization
Recent years have witnessed the booming of various differentiable optimization algorithms. These algorithms exhibit different execution patterns, and their execution needs massive computational resources that go beyond a single CPU and GPU. Existing differentiable optimization libraries, however, cannot support efficient algorithm development and multi-CPU/GPU execution, making the development of differentiable optimization algorithms often cumbersome and […]
Nov, 13
Capturing the Memory Topology of GPUs
Optimizing program code is an essential process for High-Performance Computing and in general. Due to a trend in the last years of employing graphics cards as accelerators for systems and due to a universal gain of the importance of GPUs, optimizing GPU code is crucial in order to achieve the best possible performance of a […]
Nov, 13
A Study on Neural-based Code Summarization in Low-resource Settings
Automated software engineering with deep learning techniques has been comprehensively explored because of breakthroughs in code representation learning. Many code intelligence approaches have been proposed for the downstream tasks of this field in the past years, contributing to significant performance progress. Code summarization has been the central research topic among these downstream tasks because of […]
Nov, 13
pyGSL: A Graph Structure Learning Toolkit
We introduce pyGSL, a Python library that provides efficient implementations of state-of-the-art graph structure learning models along with diverse datasets to evaluate them on. The implementations are written in GPU-friendly ways, allowing one to scale to much larger network tasks. A common interface is introduced for algorithm unrolling methods, unifying implementations of recent state-of-the-art techniques […]
Nov, 13
iGniter: Interference-Aware GPU Resource Provisioning for Predictable DNN Inference in the Cloud
GPUs are essential to accelerating the latency-sensitive deep neural network (DNN) inference workloads in cloud datacenters. To fully utilize GPU resources, spatial sharing of GPUs among co-located DNN inference workloads becomes increasingly compelling. However, GPU sharing inevitably brings severe performance interference among co-located inference workloads, as motivated by an empirical measurement study of DNN inference […]
Nov, 13
Multi-GPU thermal lattice Boltzmann simulations using OpenACC and MPI
We assess the performance of the hybrid Open Accelerator (OpenACC) and Message Passing Interface (MPI) approach for multi-graphics processing units (GPUs) accelerated thermal lattice Boltzmann (LB) simulation. The OpenACC accelerates computation on a single GPU, and the MPI synchronizes the information between multiple GPUs. With a single GPU, the two-dimension (2D) simulation achieved 1.93 billion […]