Posts
Aug, 18
Anatomizing Deep Learning Inference in Web Browsers
Web applications have increasingly adopted Deep Learning (DL) through in-browser inference, wherein DL inference performs directly within Web browsers. The actual performance of in-browser inference and its impacts on the quality of experience (QoE) remain unexplored, and urgently require new QoE measurements beyond traditional ones, e.g., mainly focusing on page load time. To bridge this […]
Aug, 14
HIPRT: A Ray Tracing Framework in HIP
We present HIPRT, an open-source ray tracing framework in HIP. HIPRT provides a versatile, cross-platform solution for professional rendering on contemporary many-core architectures. The core of the framework relies on the bounding volume hierarchy (BVH) with scalable construction algorithms and efficient ray traversal, employing hardware acceleration on AMD GPUs. From a user perspective, we aim […]
Aug, 14
A Comprehensive Deep Learning Library Benchmark and Optimal Library Selection
Deploying deep learning (DL) on mobile devices has been a notable trend in recent years. To support fast inference of on-device DL, DL libraries play a critical role as algorithms and hardware do. Unfortunately, no prior work ever dives deep into the ecosystem of modern DL libraries and provides quantitative results on their performance. In […]
Aug, 14
Acceleration for the many, not the few
Although specialized hardware promises orders of magnitude performance gains, their uptake has been limited by how challenging it is to program them. Hardware accelerators present challenges programmers are not used to, exposing details of the hardware that are often hidden and requiring new programming styles to use them effectively. Existing programming models often involve learning […]
Aug, 14
Evaluating Operators in Deep Neural Networks for Improving Performance Portability of SYCL
SYCL is a portable programming model for heterogeneous computing, so it is important to obtain reasonable performance portability of SYCL. Towards the goal of better understanding and improving performance portability of SYCL for machine learning workloads, we have been developing benchmarks for basic operators in deep neural networks (DNNs). These operators could be offloaded to […]
Aug, 14
In-Situ Techniques on GPU-Accelerated Data-Intensive Applications
The computational power of High-Performance Computing (HPC) systems is constantly increasing, however, their input/output (IO) performance grows relatively slowly, and their storage capacity is also limited. This unbalance presents significant challenges for applications such as Molecular Dynamics (MD) and Computational Fluid Dynamics (CFD), which generate massive amounts of data for further visualization or analysis. At […]
Aug, 4
LO-SpMM: Low-cost Search for High-performance SpMM Kernels on GPUs
As deep neural networks (DNNs) become increasingly large and complicated, pruning techniques are proposed for lower memory footprint and more efficient inference. The most critical kernel to execute pruned sparse DNNs on GPUs is Sparse-dense Matrix Multiplication (SpMM). To maximize the performance of SpMM, despite the high-performance implementation generated from advanced tensor compilers, they often […]
Aug, 4
Springald: GPU-Accelerated Window-Based Aggregates Over Out-of-Order Data Streams
An increasing number of application domains require high-throughput processing to extract insights from massive data streams. The Data Stream Processing (DSP) paradigm provides formal approaches to analyze structured data streams considered as special, unbounded relations. The most used class of stateful operators in DSP are the ones running sliding-window aggregation, which continuously extracts insights from […]
Aug, 4
Lectures on Parallel Computing
These lecture notes are designed to accompany an imaginary, virtual, undergraduate, one or two semester course on fundamentals of Parallel Computing as well as to serve as background and reference for graduate courses on High-Performance Computing, parallel algorithms and shared-memory multiprocessor programming. They introduce theoretical concepts and tools for expressing, analyzing and judging parallel algorithms […]
Aug, 4
Data-driven Forecasting of Deep Learning Performance on GPUs
Deep learning kernels exhibit predictable memory accesses and compute patterns, making GPUs’ parallel architecture well-suited for their execution. Software and runtime systems for GPUs are optimized to better utilize the stream multiprocessors, on-chip cache, and off-chip high-bandwidth memory. As deep learning models and GPUs evolve, access to newer GPUs is often limited, raising questions about […]
Aug, 4
Scheduling Deep Learning Jobs in Multi-Tenant GPU Clusters via Wise Resource Sharing
Deep learning (DL) has demonstrated significant success across diverse fields, leading to the construction of dedicated GPU accelerators within GPU clusters for high-quality training services. Efficient scheduler designs for such clusters are vital to reduce operational costs and enhance resource utilization. While recent schedulers have shown impressive performance in optimizing DL job performance and cluster […]
Jul, 28
Data-driven Performance Optimization for Data-intensive Applications
Data-intensive applications have attracted considerable attention from researchersin information sciences and enterprises, as these applications have made evolutionary breakthroughs in scientific fields and are extremely valuable to produce productivity in businesses. Recently, as the high speed growth of the new generated data, researchers have begun to leverage the useful knowledge hidden in such huge volume […]