29084

Posts

Feb, 12

DeepSeek-Coder: When the Large Language Model Meets Programming – The Rise of Code Intelligence

The rapid development of large language models has revolutionized code intelligence in software development. However, the predominance of closed-source models has restricted extensive research and development. To address this, we introduce the DeepSeek-Coder series, a range of open-source code models with sizes from 1.3B to 33B, trained from scratch on 2 trillion tokens. These models […]
Feb, 12

Training DNN Models over Heterogeneous Clusters with Optimal Performance

Adjusting batch sizes and adaptively tuning other hyperparameters can significantly speed up deep neural network (DNN) training. Despite the ubiquity of heterogeneous clusters, existing adaptive DNN training techniques solely consider homogeneous environments. Optimizing distributed DNN training over heterogeneous clusters is technically challenging, and directly adapting existing techniques results in low utilization and poor performance. To […]
Feb, 12

Out of kernel tuning and optimizations for portable large-scale docking experiments on GPUs

Virtual screening is an early stage in the drug discovery process that selects the most promising candidates. In the urgent computing scenario, finding a solution in the shortest time frame is critical. Any improvement in the performance of a virtual screening application translates into an increase in the number of candidates evaluated, thereby raising the […]
Feb, 12

Evaluating the Wide Area Classroom After 24,000 HPC Students

As of 2023 we have taught more than 24,000 students over the course of 106 events using the Wide Area Classroom, a novel distributed teaching platform. This has been a successful effort gauged by several important metrics. We describe both the technical and logistical structure of these events as well as the specific HPC curriculums […]
Feb, 4

Gallatin: A General-Purpose GPU Memory Manager

Dynamic memory management is critical for efficiently porting modern data processing pipelines to GPUs. However, building a general-purpose dynamic memory manager on GPUs is challenging due to the massive parallelism and weak memory coherence. Existing state-of-the-art GPU memory managers, Ouroboros and Reg-Eff, employ traditional data structures such as arrays and linked lists to manage memory […]
Feb, 4

Deductive verification for SYCL

A heterogeneous computing system is a system composed of different types of computing units. SYCL is a software development framework with which programs can be developed for such systems. It uses the concept of kernels, where a kernel executes code inside it in parallel, and different kernels can be executed concurrently on multiple computing units. […]
Feb, 4

LeftoverLocals: Listening to LLM Responses Through Leaked GPU Local Memory

This paper describes LeftoverLocals: a vulnerability that allows data recovery from GPU memory created by another process on Apple, Qualcomm, and AMD GPUs. LeftoverLocals impacts the security posture of GPU applications, with particular significance to LLMs and ML models that run on impacted GPUs. By recovering local memory, an optimized GPU memory region, we built […]
Feb, 4

Towards a GPU-Parallelization of the neXtSIM-DG Dynamical Core

The cryosphere plays a significant role in Earth’s climate system. Therefore, an accurate simulation of sea ice is of great importance to improve climate projections. To enable higher resolution simulations, graphics processing units (GPUs) have become increasingly attractive as they offer higher floating point peak performance and better energy efficiency compared to CPUs. However, making […]
Feb, 4

High-order thread-safe lattice Boltzmann model for HPC turbulent flow simulations

We present a highly-optimized thread-safe lattice Boltzmann model in which the non-equilibrium part of the distribution function is locally reconstructed via recursivity of Hermite polynomials. Such a procedure allows the explicit incorporation of non-equilibrium moments of the distribution up to the order supported by the lattice. Thus, the proposed approach increases accuracy and stability at […]
Jan, 28

Assessing the Impact of Compiler Optimizations on GPUs Reliability

Graphics Processing Units (GPUs) compilers have evolved in order to support general-purpose programming languages for multiple architectures. NVIDIA CUDA Compiler (NVCC) has many compilation levels before generating the machine code and applies complex optimizations to improve performance. These optimizations modify how the software is mapped in the underlying hardware; thus, as we show in this […]
Jan, 28

Lessons Learned Migrating CUDA to SYCL: A HEP Case Study with ROOT RDataFrame

The world’s largest particle accelerator, located at CERN, produces petabytes of data that need to be analysed efficiently, to study the fundamental structures of our universe. ROOT is an open-source C++ data analysis framework, developed for this purpose. Its high-level data analysis interface, RDataFrame, currently only supports CPU parallelism. Given the increasing heterogeneity in computing […]
Jan, 28

Application of performance portability solutions for GPUs and many-core CPUs to track reconstruction kernels

Next generation High-Energy Physics (HEP) experiments are presented with significant computational challenges, both in terms of data volume and processing power. Using compute accelerators, such as GPUs, is one of the promising ways to provide the necessary computational power to meet the challenge. The current programming models for compute accelerators often involve using architecture-specific programming […]

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us: