29705

Posts

Jan, 27

Good things come in small packages: Should we adopt Lite-GPUs in AI infrastructure?

To match the blooming demand of generative AI workloads, GPU designers have so far been trying to pack more and more compute and memory into single complex and expensive packages. However, there is growing uncertainty about the scalability of individual GPUs and thus AI clusters, as state-of-the-art GPUs are already displaying packaging, yield, and cooling […]
Jan, 27

Exploring data flow design and vectorization with oneAPI for streaming applications on CPU+GPU

In recent times, oneAPI has emerged as a competitive framework to optimize streaming applications on heterogeneous CPU+GPU architectures, since it provides portability and performance thanks to the SYCL programming language and efficient parallel libraries as oneTBB. However, this approach opens up a wealth of implementations alternatives in this type of applications: from how to design […]
Jan, 27

Adaptive Optimization Techniques for High-Performance Computing

The dataset sizes and computing needs of increasingly prevalent high-performance computing (HPC) applications have grown exponentially over the last decade. Moreover, modern computing architectures are evolving with different paradigms, and accelerators have become indispensable parts of computing. Consequently, the imperative for performance optimization for HPC applications and intelligent resource management for evolving architectures has become […]
Jan, 27

Dissecting the NVIDIA Hopper Architecture through Microbenchmarking and Multiple Level Analysis

Modern GPUs, with their specialized hardware like tensor cores, are essential for demanding AI and deep learning applications. This study presents a comprehensive, multi-level microbenchmarking analysis of the NVIDIA Hopper GPU architecture, delving into its performance characteristics and novel features. We benchmark Hopper’s memory subsystem latency and throughput, comparing its L2 partitioned cache behavior and […]
Jan, 20

Code Generation for Cryptographic Kernels using Multi-word Modular Arithmetic on GPU

Fully homomorphic encryption (FHE) and zero-knowledge proofs (ZKPs) are emerging as solutions for data security in distributed environments. However, the widespread adoption of these encryption techniques is hindered by their significant computational overhead, primarily resulting from core cryptographic operations that involve large integer arithmetic. This paper presents a formalization of multi-word modular arithmetic (MoMA), which […]
Jan, 20

GSParLib: A multi-level programming interface unifying OpenCL and CUDA for expressing stream and data parallelism

The evolution of Graphics Processing Units (GPUs) has allowed the industry to overcome long-lasting problems and challenges. Many belong to the stream processing domain, whose central aspect is continuously receiving and processing data from streaming data producers such as cameras and sensors. Nonetheless, programming GPUs is challenging because it requires deep knowledge of many-core programming, […]
Jan, 20

A User’s Guide to KSig: GPU-Accelerated Computation of the Signature Kernel

The signature kernel is a positive definite kernel for sequential and temporal data that has become increasingly popular in machine learning applications due to powerful theoretical guarantees, strong empirical performance, and recently introduced various scalable variations. In this chapter, we give a short introduction to KSig, a Scikit-Learn compatible Python package that implements various GPU-accelerated […]
Jan, 20

Boosting Performance of Iterative Applications on GPUs: Kernel Batching with CUDA Graphs

Graphics Processing Units (GPUs) have become the standard in accelerating scientific applications on heterogeneous systems. However, as GPUs are getting faster, one potential performance bottleneck with GPU-accelerated applications is the overhead from launching several fine-grained kernels. CUDA Graph addresses these performance challenges by enabling a graph-based execution model that captures operations as nodes and dependence […]
Jan, 20

Keras Sig: Efficient Path Signature Computation on GPU in Keras 3

In this paper we introduce Keras Sig a high-performance pythonic library designed to compute path signature for deep learning applications. Entirely built in Keras 3, Keras Sig leverages the seamless integration with the mostly used deep learning backends such as PyTorch, JAX and TensorFlow. Inspired by Kidger and Lyons (2021),we proposed a novel approach reshaping […]
Jan, 13

CGP-Tuning: Structure-Aware Soft Prompt Tuning for Code Vulnerability Detection

Large language models (LLMs) have been proposed as powerful tools for detecting software vulnerabilities, where task-specific fine-tuning is typically employed to provide vulnerability-specific knowledge to the LLMs for this purpose. However, traditional full-parameter fine-tuning is inefficient for modern, complex LLMs, which contain billions of parameters. Soft prompt tuning has been suggested as a more efficient […]
Jan, 13

SCALE-Ahead-Of-Time Compilation of CUDA for AMD GPUs

SCALE is a new solution by Spectral Compute that empowers developers to write code once and deploy it across a range of GPU hardware platforms without modifying the original code. Designed to extend CUDA’s capabilities to AMD GPUs, SCALE maintains CUDA compatibility while introducing novel features that streamline GPU programming. This demo paper presents SCALE’s […]
Jan, 13

Validation of GPU Computation in Decentralized, Trustless Networks

Verifying computational processes in decentralized networks poses a fundamental challenge, particularly for Graphics Processing Unit (GPU) computations. Our investigation reveals significant limitations in existing approaches: exact recomputation fails due to computational non-determinism across GPU nodes, Trusted Execution Environments (TEEs) require specialized hardware, and Fully Homomorphic Encryption (FHE) faces prohibitive computational costs. To address these challenges, […]

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us: