29184

Posts

Apr, 14

A Systematic Literature Survey of Sparse Matrix-Vector Multiplication

Sparse matrix-vector multiplication (SpMV) is a crucial computing kernel with widespread applications in iterative algorithms. Over the past decades, research on SpMV optimization has made remarkable strides, giving rise to various optimization contributions. However, the comprehensive and systematic literature survey that introduces, analyzes, discusses, and summarizes the advancements of SpMV in recent years is currently […]
Apr, 14

QArray: a GPU-accelerated constant capacitance model simulator for large quantum dot arrays

Semiconductor quantum dot arrays are a leading architecture for the development of quantum technologies. Over the years, the constant capacitance model has served as a fundamental framework for simulating, understanding, and navigating the charge stability diagrams of small quantum dot arrays. However, while the size of the arrays keeps growing, solving the constant capacitance model […]
Apr, 14

Balancing Tracking Granularity and Parallelism in Many-Task Systems: The Horizons Approach

Reducing the need for users to manually manage the details of work and data distribution is an important goal of high-level many-task runtime systems. For distributed memory platforms this means that the runtime system has to keep track of both fine-grained task dependencies and data residency meta-information. The amount of such meta-information is proportional to […]
Apr, 14

OpenMP offload at the Exascale using Intel GPU Max 1550: evaluation of STREAmS compressible solver

Nearly 20 years after the birth of general purpose GPU computing, the HPC landscape is now dominated by GPUs. After years of undisputed dominance by NVIDIA, new players have entered the arena in a convincing manner, namely AMD and more recently Intel, whose devices currently power the first two clusters in the Top500 ranking. Unfortunately, […]
Apr, 14

High Performance Privacy Preserving AI

Artificial intelligence (AI) depends on data. In sensitive domains – such as healthcare, security, finance, and many more – there is therefore tension between unleashing the power of AI and maintaining the confidentiality and security of the relevant data. This book – intended for researchers in academia and R&D engineers in industry – explains how […]
Apr, 7

94% on CIFAR-10 in 3.29 Seconds on a Single GPU

CIFAR-10 is among the most widely used datasets in machine learning, facilitating thousands of research projects per year. To accelerate research and reduce the cost of experiments, we introduce training methods for CIFAR-10 which reach 94% accuracy in 3.29 seconds, 95% in 10.4 seconds, and 96% in 46.3 seconds, when run on a single NVIDIA […]
Apr, 7

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

Determining the maximum usage of random-access memory (RAM) on both the motherboard and on a graphical processing unit (GPU) over the lifetime of a computing task can be extremely useful for troubleshooting points of failure as well as optimizing memory utilization, especially within a high-performance computing (HPC) setting. While there are tools for tracking compute […]
Apr, 7

Speed, power and cost implications for GPU acceleration of Computational Fluid Dynamics on HPC systems

Computational Fluid Dynamics (CFD) is the simulation of fluid flow undertaken with the use of computational hardware. The underlying equations are computationally challenging to solve and necessitate high performance computing (HPC) to resolve in a practical timeframe when a reasonable level of fidelity is required. The simulations are memory intensive, having previously been limited to […]
Apr, 7

Seer: Predictive Runtime Kernel Selection for Irregular Problems

Modern GPUs are designed for regular problems and suffer from load imbalance when processing irregular data. Prior to our work, a domain expert selects the best kernel to map fine-grained irregular parallelism to a GPU. We instead propose Seer, an abstraction for producing a simple, reproduceable, and understandable decision tree selector model which performs runtime […]
Apr, 7

Using Intel oneAPI for Multi-hybrid Acceleration Programming with GPU and FPGA Coupling

Intel oneAPI is a programming framework that accepts various accelerators such as GPUs, FPGAs, and multi-core CPUs, with a focus on HPC applications. Users can apply their code written in a single language, DPC++, to this heterogeneous programming environment. However, in practice, it is not easy to apply to different accelerators, especially for non-Intel devices […]
Mar, 24

LOOPer: A Learned Automatic Code Optimizer For Polyhedral Compilers

While polyhedral compilers have shown success in implementing advanced code transformations, they still have challenges in selecting the most profitable transformations that lead to the best speedups. This has motivated the use of machine learning to build cost models to guide the search for polyhedral optimizations. State-of-the-art polyhedral compilers have demonstrated a viable proof-of-concept of […]
Mar, 24

Full-Scale File System Acceleration on GPU

Modern HPC and AI Computing solutions regularly use GPUs as their main source of computational power. This creates a significant imbalance for storage operations for GPU applications, as every such storage operation has to be signalled to and handled by the CPU. In GPU4FS, we propose a radical solution to this imbalance: Move the file […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: