29692

Posts

Jan, 20

Code Generation for Cryptographic Kernels using Multi-word Modular Arithmetic on GPU

Fully homomorphic encryption (FHE) and zero-knowledge proofs (ZKPs) are emerging as solutions for data security in distributed environments. However, the widespread adoption of these encryption techniques is hindered by their significant computational overhead, primarily resulting from core cryptographic operations that involve large integer arithmetic. This paper presents a formalization of multi-word modular arithmetic (MoMA), which […]
Jan, 20

GSParLib: A multi-level programming interface unifying OpenCL and CUDA for expressing stream and data parallelism

The evolution of Graphics Processing Units (GPUs) has allowed the industry to overcome long-lasting problems and challenges. Many belong to the stream processing domain, whose central aspect is continuously receiving and processing data from streaming data producers such as cameras and sensors. Nonetheless, programming GPUs is challenging because it requires deep knowledge of many-core programming, […]
Jan, 20

A User’s Guide to KSig: GPU-Accelerated Computation of the Signature Kernel

The signature kernel is a positive definite kernel for sequential and temporal data that has become increasingly popular in machine learning applications due to powerful theoretical guarantees, strong empirical performance, and recently introduced various scalable variations. In this chapter, we give a short introduction to KSig, a Scikit-Learn compatible Python package that implements various GPU-accelerated […]
Jan, 20

Boosting Performance of Iterative Applications on GPUs: Kernel Batching with CUDA Graphs

Graphics Processing Units (GPUs) have become the standard in accelerating scientific applications on heterogeneous systems. However, as GPUs are getting faster, one potential performance bottleneck with GPU-accelerated applications is the overhead from launching several fine-grained kernels. CUDA Graph addresses these performance challenges by enabling a graph-based execution model that captures operations as nodes and dependence […]
Jan, 20

Keras Sig: Efficient Path Signature Computation on GPU in Keras 3

In this paper we introduce Keras Sig a high-performance pythonic library designed to compute path signature for deep learning applications. Entirely built in Keras 3, Keras Sig leverages the seamless integration with the mostly used deep learning backends such as PyTorch, JAX and TensorFlow. Inspired by Kidger and Lyons (2021),we proposed a novel approach reshaping […]
Jan, 13

CGP-Tuning: Structure-Aware Soft Prompt Tuning for Code Vulnerability Detection

Large language models (LLMs) have been proposed as powerful tools for detecting software vulnerabilities, where task-specific fine-tuning is typically employed to provide vulnerability-specific knowledge to the LLMs for this purpose. However, traditional full-parameter fine-tuning is inefficient for modern, complex LLMs, which contain billions of parameters. Soft prompt tuning has been suggested as a more efficient […]
Jan, 13

SCALE-Ahead-Of-Time Compilation of CUDA for AMD GPUs

SCALE is a new solution by Spectral Compute that empowers developers to write code once and deploy it across a range of GPU hardware platforms without modifying the original code. Designed to extend CUDA’s capabilities to AMD GPUs, SCALE maintains CUDA compatibility while introducing novel features that streamline GPU programming. This demo paper presents SCALE’s […]
Jan, 13

Validation of GPU Computation in Decentralized, Trustless Networks

Verifying computational processes in decentralized networks poses a fundamental challenge, particularly for Graphics Processing Unit (GPU) computations. Our investigation reveals significant limitations in existing approaches: exact recomputation fails due to computational non-determinism across GPU nodes, Trusted Execution Environments (TEEs) require specialized hardware, and Fully Homomorphic Encryption (FHE) faces prohibitive computational costs. To address these challenges, […]
Jan, 13

LeetDecoding: A PyTorch Library for Exponentially Decaying Causal Linear Attention with CUDA Implementations

The machine learning and data science community has made significant while dispersive progress in accelerating transformer-based large language models (LLMs), and one promising approach is to replace the original causal attention in a generative pre-trained transformer (GPT) with exponentially decaying causal linear attention. In this paper, we present LeetDecoding, which is the first Python package […]
Jan, 13

Data Parallel Visualization and Rendering on the RAMSES Supercomputer with ANARI

3D visualization and rendering in HPC are very heterogenous applications, though fundamentally the tasks involved are well-defined and do not differ much from application to application. The Khronos Group’s ANARI standard seeks to consolidate 3D rendering across sci-vis applications. This paper makes an effort to convey challenges of 3D rendering and visualization with ANARI in […]
Jan, 6

Finding Missed Code Size Optimizations in Compilers using LLMs

Compilers are complex, and significant effort has been expended on testing them. Techniques such as random program generation and differential testing have proved highly effective and have uncovered thousands of bugs in production compilers. The majority of effort has been expended on validating that a compiler produces correct code for a given input, while less […]
Jan, 6

Enhancing Deployment-Time Predictive Model Robustness for Code Analysis and Optimization

Supervised machine learning techniques have shown promising results in code analysis and optimization problems. However, a learning-based solution can be brittle because minor changes in hardware or application workloads — such as facing a new CPU architecture or code pattern — may jeopardize decision accuracy, ultimately undermining model robustness. We introduce Prom, an open-source library […]

Recent source codes

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us: