Posts
Apr, 18
MetaCL – A Model-Based Approach to Programming Heterogeneous Architectures Using OpenCL
With demand for high-performance computing at an all-time high, especially from the scientific/numerical analysis community, leveraging the power of existing heterogeneous architectures has become increasingly desirable. The attempt to use GPUs for non-graphics computations has bred programming models and innovative architectures that have trended towards a general-purpose computing platform. The latest generation of programming tools […]
Apr, 18
OpenCL vs. OpenMP: A Programmability Debate
OpenCL and OpenMP are the most commonly used programming models for homogeneous multi-core processors. They are also fundamentally different in their approach to parallelization, in terms of granularity level, explicit/implicit constructs, and usability. In this paper, we compare these two models in terms of programmability, with a special focus on performance and productivity. For our […]
Apr, 18
High-Performance Matrix-Vector Multiplication on the GPU
In this paper, we develop a high-performance GPU kernel for one of the most popular dense linear algebra operations, the matrix-vector multiplication. The target hardware is the most recent Nvidia Tesla 20-series (Fermi architecture), which is designed from the ground up for scientific computing. We show that it is essentially a matter of fully utilizing […]
Apr, 17
Implementation of Massive Artificial Neural Networks with CUDA
People have always been amazed with the inner-workings of the human brain. The brain is capable of solving variety of problems that are unsolvable by any computers. Is capable of detecting minute changes of light, sound or smell. It is capable of instantly recognizing a face, to accurately read the handwritten text, etc. The brain […]
Apr, 17
Monte Carlo Modeling of Electron Transport Using CUDA Technology
Statistical algorithms are presented for modeling the interaction processes between electrons and matter. A software implementation has been developed for hybrid supercomputers making use of NVIDIA CUDA technology. Standard Monte Carlo schemes are modified for effectively exploiting the parallel computing capabilities of graphical processors. The model of individual collisions (MIC) is used to describe the […]
Apr, 17
Auto-tuning interactive ray tracing using an analytical GPU architecture model
This paper presents a method for auto-tuning interactive ray tracing on GPUs using a hardware model. Getting full performance from modern GPUs is a challenging task. Workloads which require a guaranteed performance over several runs must select parameters for the worst performance of all runs. Our method uses an analytical GPU performance model to predict […]
Apr, 17
Exact diagonalization of the Hubbard model on graphics processing units
We solve the Hubbard model with the exact diagonalization method on a graphics processing unit (GPU). We benchmark our GPU program against a sequential CPU code by using the Lanczos algorithm to solve the ground state energy in two cases: a one-dimensional ring and a two-dimensional square lattice. In the one-dimensional case, we obtain speedups […]
Apr, 17
A pilgrimage to gravity on GPUs
In this short review we present the developments over the last 5 decades that have led to the use of Graphics Processing Units (GPUs) for astrophysical simulations. Since the introduction of NVIDIA’s Compute Unified Device Architecture (CUDA) in 2007 the GPU has become a valuable tool for N-body simulations and is so popular these days […]
Apr, 17
High Performance Stencil Code Algorithms for GPGPUs
In this paper we investigate how stencil computations can be implemented on state-of-the-art general purpose graphics processing units (GPGPUs). Stencil codes can be found at the core of many numerical solvers and physical simulation codes and are therefore of particular interest to scientific computing research. GPGPUs have gained a lot of attention recently because of […]
Apr, 16
Solving incompressible two-phase flows on multi-GPU clusters
We present a fully multi-GPU-based double-precision solver for the three-dimensional two-phase incompressible Navier-Stokes equations. It is able to simulate the interaction of two fluids like air and water based on a level-set approach. High-order finite difference schemes and Chorin’s projection approach for space and time discretization are applied. An in-depth performance analysis shows a realistic […]
Apr, 16
Auto-Tuning of Level 1 and Level 2 BLAS for GPUs
The use of high performance libraries for dense linear algebra operations is of great importance in many numerical scientific applications. The most common operations form the backbone of the Basic Linear Algebra Subroutines (BLAS) library. In this paper, we consider the performance and auto-tuning of level 1 and level 2 BLAS routines on GPUs. As […]
Apr, 16
Acceleration of CFD and data analysis using graphics processors
Graphics processing units function well as high performance computing devices for scientific computing. The non-standard processor architecture and high memory bandwidth allow graphics processing units (GPUs) to provide some of the best performance in terms of FLOPS per dollar. Recently these capabilities became accessible for general purpose computations with the CUDA programming environment on NVIDIA […]