Posts
Apr, 19
Performance Exploration of Selected Manually and Automatically Parallelized Codes on GPUs
General-Purpose computing on GPUs (GPGPU) provides the opportunity to utilize the tremendous computational power of graphics accelerators for a wider set of problems. These devices leverage massive parallelism to achieve high performance, however, creating highly parallelized code which is optimized for the characteristics of GPUs is no simple task. The polyhedron model is used successfully […]
Apr, 19
GPU-Accelerated Numerical Simulations of the Knudsen Gas on Time-Dependent Domains
We consider the long-time behaviour of a free-molecular gas in a time-dependent vessel with absorbing boundary, in any space dimension. We first show, at the theoretical level, that the convergence towards equilibrium heavily depends on the initial data and on the time evolution law of the vessel. Subsequently, we describe a numerical strategy to simulate […]
Apr, 19
Algorithm Construction for GPGPU
Today every personal computer and almost every work-related computer has a GPU powerful enough to be used as a supplementary computational device. One framework which enables utilization of this is called OpenCL. We asked the question how one writes efficient algorithms on these GPGPU devices. We found that there are two major ways to run […]
Apr, 19
Multi-level Parallelism for Time- and Cost-efficient Parallel Discrete Event Simulation on GPUs
Developing complex technical systems requires a systematic exploration of the given design space in order to identify optimal system configurations. However, studying the effects and interactions of even a small number of system parameters often requires an extensive number of simulation runs. This in turn results in excessive runtime demands which severely hamper thorough design […]
Apr, 19
GPU computing in medical physics: A review
The graphics processing unit (GPU) has emerged as a competitive platform for computing massively parallel problems. Many computing applications in medical physics can be formulated as data-parallel tasks that exploit the capabilities of the GPU for reducing processing times. The authors review the basic principles of GPU computing as well as the main performance optimization […]
Apr, 18
Auto-tuning Dense Vector and Matrix-Vector Operations for Fermi GPUs
In this paper, we consider the automatic performance tuning of dense vector and matrix-vector operations on GPUs. Such operations form the backbone of level 1 and level 2 routines in the Basic Linear Algebra Subroutines (BLAS) library and are therefore of great importance in many scientific applications. As examples, we develop single-precision CUDA kernels for […]
Apr, 18
Maximize Performance on GPUs Using the Rake-based Optimization: A Case Study
In this paper, we analyze the trade-offs encountered when minimizing the total execution time using the rake-based applications on GPUs. We use clustering data streams as a case study, and present a rake-based implementation for it, making it more efficient in terms of memory usage. In order to maximize performance for different problem sizes and […]
Apr, 18
MetaCL – A Model-Based Approach to Programming Heterogeneous Architectures Using OpenCL
With demand for high-performance computing at an all-time high, especially from the scientific/numerical analysis community, leveraging the power of existing heterogeneous architectures has become increasingly desirable. The attempt to use GPUs for non-graphics computations has bred programming models and innovative architectures that have trended towards a general-purpose computing platform. The latest generation of programming tools […]
Apr, 18
OpenCL vs. OpenMP: A Programmability Debate
OpenCL and OpenMP are the most commonly used programming models for homogeneous multi-core processors. They are also fundamentally different in their approach to parallelization, in terms of granularity level, explicit/implicit constructs, and usability. In this paper, we compare these two models in terms of programmability, with a special focus on performance and productivity. For our […]
Apr, 18
High-Performance Matrix-Vector Multiplication on the GPU
In this paper, we develop a high-performance GPU kernel for one of the most popular dense linear algebra operations, the matrix-vector multiplication. The target hardware is the most recent Nvidia Tesla 20-series (Fermi architecture), which is designed from the ground up for scientific computing. We show that it is essentially a matter of fully utilizing […]
Apr, 17
Implementation of Massive Artificial Neural Networks with CUDA
People have always been amazed with the inner-workings of the human brain. The brain is capable of solving variety of problems that are unsolvable by any computers. Is capable of detecting minute changes of light, sound or smell. It is capable of instantly recognizing a face, to accurately read the handwritten text, etc. The brain […]
Apr, 17
Monte Carlo Modeling of Electron Transport Using CUDA Technology
Statistical algorithms are presented for modeling the interaction processes between electrons and matter. A software implementation has been developed for hybrid supercomputers making use of NVIDIA CUDA technology. Standard Monte Carlo schemes are modified for effectively exploiting the parallel computing capabilities of graphical processors. The model of individual collisions (MIC) is used to describe the […]