7550

Posts

Apr, 18

Auto-tuning Dense Vector and Matrix-Vector Operations for Fermi GPUs

In this paper, we consider the automatic performance tuning of dense vector and matrix-vector operations on GPUs. Such operations form the backbone of level 1 and level 2 routines in the Basic Linear Algebra Subroutines (BLAS) library and are therefore of great importance in many scientific applications. As examples, we develop single-precision CUDA kernels for […]
Apr, 18

Maximize Performance on GPUs Using the Rake-based Optimization: A Case Study

In this paper, we analyze the trade-offs encountered when minimizing the total execution time using the rake-based applications on GPUs. We use clustering data streams as a case study, and present a rake-based implementation for it, making it more efficient in terms of memory usage. In order to maximize performance for different problem sizes and […]
Apr, 18

MetaCL – A Model-Based Approach to Programming Heterogeneous Architectures Using OpenCL

With demand for high-performance computing at an all-time high, especially from the scientific/numerical analysis community, leveraging the power of existing heterogeneous architectures has become increasingly desirable. The attempt to use GPUs for non-graphics computations has bred programming models and innovative architectures that have trended towards a general-purpose computing platform. The latest generation of programming tools […]
Apr, 18

OpenCL vs. OpenMP: A Programmability Debate

OpenCL and OpenMP are the most commonly used programming models for homogeneous multi-core processors. They are also fundamentally different in their approach to parallelization, in terms of granularity level, explicit/implicit constructs, and usability. In this paper, we compare these two models in terms of programmability, with a special focus on performance and productivity. For our […]
Apr, 18

High-Performance Matrix-Vector Multiplication on the GPU

In this paper, we develop a high-performance GPU kernel for one of the most popular dense linear algebra operations, the matrix-vector multiplication. The target hardware is the most recent Nvidia Tesla 20-series (Fermi architecture), which is designed from the ground up for scientific computing. We show that it is essentially a matter of fully utilizing […]
Apr, 17

Implementation of Massive Artificial Neural Networks with CUDA

People have always been amazed with the inner-workings of the human brain. The brain is capable of solving variety of problems that are unsolvable by any computers. Is capable of detecting minute changes of light, sound or smell. It is capable of instantly recognizing a face, to accurately read the handwritten text, etc. The brain […]
Apr, 17

Monte Carlo Modeling of Electron Transport Using CUDA Technology

Statistical algorithms are presented for modeling the interaction processes between electrons and matter. A software implementation has been developed for hybrid supercomputers making use of NVIDIA CUDA technology. Standard Monte Carlo schemes are modified for effectively exploiting the parallel computing capabilities of graphical processors. The model of individual collisions (MIC) is used to describe the […]
Apr, 17

Auto-tuning interactive ray tracing using an analytical GPU architecture model

This paper presents a method for auto-tuning interactive ray tracing on GPUs using a hardware model. Getting full performance from modern GPUs is a challenging task. Workloads which require a guaranteed performance over several runs must select parameters for the worst performance of all runs. Our method uses an analytical GPU performance model to predict […]
Apr, 17

Exact diagonalization of the Hubbard model on graphics processing units

We solve the Hubbard model with the exact diagonalization method on a graphics processing unit (GPU). We benchmark our GPU program against a sequential CPU code by using the Lanczos algorithm to solve the ground state energy in two cases: a one-dimensional ring and a two-dimensional square lattice. In the one-dimensional case, we obtain speedups […]
Apr, 17

A pilgrimage to gravity on GPUs

In this short review we present the developments over the last 5 decades that have led to the use of Graphics Processing Units (GPUs) for astrophysical simulations. Since the introduction of NVIDIA’s Compute Unified Device Architecture (CUDA) in 2007 the GPU has become a valuable tool for N-body simulations and is so popular these days […]
Apr, 17

High Performance Stencil Code Algorithms for GPGPUs

In this paper we investigate how stencil computations can be implemented on state-of-the-art general purpose graphics processing units (GPGPUs). Stencil codes can be found at the core of many numerical solvers and physical simulation codes and are therefore of particular interest to scientific computing research. GPGPUs have gained a lot of attention recently because of […]
Apr, 16

Solving incompressible two-phase flows on multi-GPU clusters

We present a fully multi-GPU-based double-precision solver for the three-dimensional two-phase incompressible Navier-Stokes equations. It is able to simulate the interaction of two fluids like air and water based on a level-set approach. High-order finite difference schemes and Chorin’s projection approach for space and time discretization are applied. An in-depth performance analysis shows a realistic […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: