Posts
Feb, 17
GPU Programming with CUDA: A brief overview
In this paper we describe the architecture of a NVIDIA GPU, as well as the CUDA programming model. The basic statements are explained. We also provide an example of CUDA code, explaining its execution workflow in a GPU device.
Feb, 17
Optimizing Performance of Stencil Code with SPL Conqueror
A standard technique to numerically solve elliptic partial differential equations on structured grids is to discretize them via finite differences and then to apply an efficient geometric multi-grid solver. Unfortunately, finding the optimal choice of multi-grid components and parameters is challenging and platform dependent, especially, in cases where domain knowledge is incomplete. Auto-tuning is a […]
Feb, 17
Interactive Design Exploration for Constrained Meshes
In architectural design, surface shapes are commonly subject to geometric constraints imposed by material, fabrication or assembly. Rationalization algorithms can convert a freeform design into a form feasible for production, but often require design modifications that might not comply with the design intent. In addition, they only offer limited support for exploring alternative feasible shapes, […]
Feb, 17
Efficient pseudo-random number generation for monte-carlo simulations using graphic processors
A hybrid approach based on the combination of three Tausworthe generators and one linear congruential generator for pseudo random number generation for GPU programing as suggested in NVIDIA-CUDA library has been used for MONTE-CARLO sampling. On each GPU thread, a random seed is generated on fly in a simple way using the quick and dirty […]
Feb, 17
Resolution of Linear Algebra for the Discrete Logarithm Problem using GPU and Multi-core Architectures
In cryptanalysis, solving the discrete logarithm problem (DLP) is key to assessing the security of many public-key cryptosystems. The index-calculus methods, that attack the DLP in multiplicative subgroups of finite fields, require solving large sparse systems of linear equations modulo large primes. This article deals with how we can run this computation on GPU- and […]
Feb, 17
Fast American Basket Option Pricing on a multi-GPU Cluster
This article presents a multi-GPU adaptation of a specific Monte Carlo and classification based method for pricing American basket options, due to Picazo. The first part relates how to combine fine and coarse-grained parallelization to price American basket options. A dynamic strategy of kernel calibration is proposed. Doing so, our implementation on a reasonable size […]
Feb, 16
Towards Porting a Real-World Seismological Application to the Intel MIC Architecture
This whitepaper aims to discuss first experiences with porting an MPI-based real-world geophysical application to the new Intel Many Integrated Core (MIC) architecture. The selected code SeisSol is an application written in Fortran that can be used to simulate earthquake rupture and radiating seismic wave propagation in complex 3-D heterogeneous materials. The PRACE prototype cluster […]
Feb, 16
Direct Numerical Simulation and Large Eddy Simulation on a Turbulent Wall-Bounded Flow Using Lattice Boltzmann Method and Multiple GPUs
Direct numerical simulation (DNS) and large eddy simulation (LES) were performed on the wall-bounded flow at Re_tau = 180 using lattice Boltzmann method (LBM) and multiple Graphic Processing Units (GPUs). In the DNS, 8 K20M GPUs were adopted. The maximum number of meshes is 6.7×10^7, which results in the non-dimensional mesh size of Delta+=1.41 for […]
Feb, 16
Cuda K-Nn: application to the segmentation of the retinal vasculature within SD-OCT volumes of mice
In this work, a speed comparison between GPU-based CUDA k-NN implementation and the ANN implementation has been tested on three sets of medical imaging data. The results show that with higher dimensional data, CUDA-based k-NN approach could have up to two orders of magnitude of speed up. Otherwise, ANN would be a better implementation to […]
Feb, 16
Application of the Characteristic Basis Function Method using CUDA
The Characteristic Basis Function Method (CBFM) is a popular technique for efficiently solving the Method of Moments (MoM) matrix equations. In this work, we address the adaptation of this method to a relatively new computing infrastructure provided by NVIDIA, the Compute Unified Device Architecture (CUDA), and take into account some of the limitations which appear […]
Feb, 16
LDetector: A Low Overhead Race Detector For GPU Programs
Data race detection is an important problem in GPU programming. The paper presents a novel solution. It uses the compiler support to privatize shared data and then at run time parallelizes the race checking. It has two distinct features. First, there is no per access monitoring, so the race detection has a low overhead and […]
Feb, 15
ADBIS workshop on GPUs In Databases, GID 2014
High performance of modern Graphics Processing Units may be utilized not only for graphics related application but also for general computing. This computing power has been utilized in new variants of many algorithms from almost every computer science domain. Unfortunately, while other application domains strongly benefit from utilizing the GPUs, databases related applications seem not […]