Posts
Dec, 10
Edge AI for Internet of Energy: Challenges and Perspectives
The digital landscape of the Internet of Energy (IoE) is on the brink of a revolutionary transformation with the integration of edge Artificial Intelligence (AI). This comprehensive review elucidates the promise and potential that edge AI holds for reshaping the IoE ecosystem. Commencing with a meticulously curated research methodology, the article delves into the myriad […]
Dec, 10
Compiler-centric across-stack deep learning acceleration
Optimizing the deployment of Deep Neural Networks (DNNs) is hard. Despite deep learning approaches increasingly providing state-of-the-art solutions to a variety of difficult problems, such as computer vision and natural language processing, DNNs can be prohibitively expensive, for example, in terms of inference time or memory usage. Effective exploration of the design space requires a […]
Dec, 3
A Survey on Design Methodologies for Accelerating Deep Learning on Heterogeneous Architectures
In recent years, the field of Deep Learning has seen many disruptive and impactful advancements. Given the increasing complexity of deep neural networks, the need for efficient hardware accelerators has become more and more pressing to design heterogeneous HPC platforms. The design of Deep Learning accelerators requires a multidisciplinary approach, combining expertise from several areas, […]
Dec, 3
Testing and Mutation Testing for GPU Kernels
The increasing GPU performance and maturing computational platform make it possible to handle general-purpose computing jobs traditionally computed by the CPU. Also, just like what we did in the CPU program, we use testing to verify the correctness of the GPU program. However, the quality of the tests may remain unknown, which inspires us to […]
Dec, 3
A Review of the Parallelization Strategies for Iterative Algorithms
Iteration-based algorithms have been widely used and achieved excellent results in many fields. However, in the big data era, data that needs to be processed is enormous in terms of both depth (the dimensionality of data) and breadth (the volume of data). Due to the slowdown of Moore’s Law, the computing power of single-core CPUs […]
Dec, 3
CuPBoP-AMD: Extending CUDA to AMD Platforms
The proliferation of artificial intelligence applications has underscored the need for increased portability among graphic processing units (GPUs) from different vendors. With CUDA as one of the most popular GPU programming languages, CuPBoP (CUDA for Parallelized and Broad-range Processors) aims to provide NVIDIA’s proprietary CUDA language support to a variety of GPU and CPU platforms […]
Dec, 3
RDMA-Based Algorithms for Sparse Matrix Multiplication on GPUs
Sparse matrix multiplication is an important kernel for large-scale graph processing and other data-intensive applications. In this paper, we implement various asynchronous, RDMA-based sparse times dense (SpMM) and sparse times sparse (SpGEMM) algorithms, evaluating their performance running in a distributed memory setting on GPUs. Our RDMA-based implementations use the NVSHMEM communication library for direct, asynchronous […]
Nov, 27
GT4Py: High Performance Stencils for Weather and Climate Applications using Python
All major weather and climate applications are currently developed using languages such as Fortran or C++. This is typical in the domain of high performance computing (HPC), where efficient execution is an important concern. Unfortunately, this approach leads to implementations that intermix optimizations for specific hardware architectures with the high-level numerical methods that are typical […]
Nov, 27
Accelerating bioinformatics applications on CUDA-enabled multi-GPU systems
A wide range of bioinformatics applications have to deal with a continuously growing amount of data generated by high-throughput sequencing techniques. Exclusively CPU-based workstations fail to keep up with the task. Instead of employing dozens of CPU cluster nodes to increase the computational power, massively parallel accelerators like modern CUDA-enabled GPUs can be used to […]
Nov, 27
Evaluation of FPGA-based high performance computing platforms
High performance computing is a topic that has risen to the top in the era of digitalization, AI and automation. Therefore, the search for more cost and time effective ways to implement HPC work is always a subject extensively researched. One part of this is to have hardware that is capable to improve on these […]
Nov, 27
Frameworks in Medical Image Analysis with Deep Neural Networks
In recent years, deep neural network based medical image analysis has become quite powerful and achieved similar results performance-wise as experts. Consequently, the integration of these tools into the clinical routine as clinical decision support systems is highly desired. The benefits of automatic image analysis for clinicians are massive, ranging from improved diagnostic as well […]
Nov, 27
FIKIT: Priority-Based Real-time GPU Multi-tasking Scheduling with Kernel Identification
Highly parallelized workloads like machine learning training, inferences and general HPC tasks are greatly accelerated using GPU devices. In a cloud computing cluster, serving a GPU’s computation power through multi-tasks sharing is highly demanded since there are always more task requests than the number of GPU available. Existing GPU sharing solutions focus on reducing task-level […]