Posts
Dec, 10
Understanding the Topics and Challenges of GPU Programming by Classifying and Analyzing Stack Overflow Posts
GPUs have cemented their position in computer systems, not restricted to graphics but also extensively used for general-purpose computing. With this comes a rapidly expanding population of developers using GPUs for programming. However, programming with GPUs is notoriously difficult due to their unique architecture and constant evolution. A large number of developers have encountered problems […]
Dec, 10
Efficiently Processing Large Relational Joins on GPUs
With the growing interest in Machine Learning (ML), Graphic Processing Units (GPUs) have become key elements of any computing infrastructure. Their widespread deployment in data centers and the cloud raises the question of how to use them beyond ML use cases, with growing interest in employing them in a database context. In this paper, we […]
Dec, 3
A Survey on Design Methodologies for Accelerating Deep Learning on Heterogeneous Architectures
In recent years, the field of Deep Learning has seen many disruptive and impactful advancements. Given the increasing complexity of deep neural networks, the need for efficient hardware accelerators has become more and more pressing to design heterogeneous HPC platforms. The design of Deep Learning accelerators requires a multidisciplinary approach, combining expertise from several areas, […]
Dec, 3
Testing and Mutation Testing for GPU Kernels
The increasing GPU performance and maturing computational platform make it possible to handle general-purpose computing jobs traditionally computed by the CPU. Also, just like what we did in the CPU program, we use testing to verify the correctness of the GPU program. However, the quality of the tests may remain unknown, which inspires us to […]
Dec, 3
A Review of the Parallelization Strategies for Iterative Algorithms
Iteration-based algorithms have been widely used and achieved excellent results in many fields. However, in the big data era, data that needs to be processed is enormous in terms of both depth (the dimensionality of data) and breadth (the volume of data). Due to the slowdown of Moore’s Law, the computing power of single-core CPUs […]
Dec, 3
CuPBoP-AMD: Extending CUDA to AMD Platforms
The proliferation of artificial intelligence applications has underscored the need for increased portability among graphic processing units (GPUs) from different vendors. With CUDA as one of the most popular GPU programming languages, CuPBoP (CUDA for Parallelized and Broad-range Processors) aims to provide NVIDIA’s proprietary CUDA language support to a variety of GPU and CPU platforms […]
Dec, 3
RDMA-Based Algorithms for Sparse Matrix Multiplication on GPUs
Sparse matrix multiplication is an important kernel for large-scale graph processing and other data-intensive applications. In this paper, we implement various asynchronous, RDMA-based sparse times dense (SpMM) and sparse times sparse (SpGEMM) algorithms, evaluating their performance running in a distributed memory setting on GPUs. Our RDMA-based implementations use the NVSHMEM communication library for direct, asynchronous […]
Nov, 27
GT4Py: High Performance Stencils for Weather and Climate Applications using Python
All major weather and climate applications are currently developed using languages such as Fortran or C++. This is typical in the domain of high performance computing (HPC), where efficient execution is an important concern. Unfortunately, this approach leads to implementations that intermix optimizations for specific hardware architectures with the high-level numerical methods that are typical […]
Nov, 27
Accelerating bioinformatics applications on CUDA-enabled multi-GPU systems
A wide range of bioinformatics applications have to deal with a continuously growing amount of data generated by high-throughput sequencing techniques. Exclusively CPU-based workstations fail to keep up with the task. Instead of employing dozens of CPU cluster nodes to increase the computational power, massively parallel accelerators like modern CUDA-enabled GPUs can be used to […]
Nov, 27
Evaluation of FPGA-based high performance computing platforms
High performance computing is a topic that has risen to the top in the era of digitalization, AI and automation. Therefore, the search for more cost and time effective ways to implement HPC work is always a subject extensively researched. One part of this is to have hardware that is capable to improve on these […]
Nov, 27
Frameworks in Medical Image Analysis with Deep Neural Networks
In recent years, deep neural network based medical image analysis has become quite powerful and achieved similar results performance-wise as experts. Consequently, the integration of these tools into the clinical routine as clinical decision support systems is highly desired. The benefits of automatic image analysis for clinicians are massive, ranging from improved diagnostic as well […]
Nov, 27
FIKIT: Priority-Based Real-time GPU Multi-tasking Scheduling with Kernel Identification
Highly parallelized workloads like machine learning training, inferences and general HPC tasks are greatly accelerated using GPU devices. In a cloud computing cluster, serving a GPU’s computation power through multi-tasks sharing is highly demanded since there are always more task requests than the number of GPU available. Existing GPU sharing solutions focus on reducing task-level […]

