Posts
Feb, 17
A Similarity-Based Analysis Tool for Scientific Application Porting
Porting applications to a new system is a nontrivial job in the HPC field. It is a very time-consuming, labor-intensive process, and the quality of the results will depend critically on the experience of the experts involved. In order to ease the porting process, we propose a methodology to address an important aspect of software […]
Feb, 17
The battle of the giants: a case study of GPU vs FPGA optimisation for real-time image processing
This paper focuses on a thorough comparison of the two main hardware targets for real-time optimization of a computer vision algorithm: GPU and FPGA. Based on a complex case study algorithm for threaded isle detection, implementation on both hardware targets is compared in terms of resulting time performance, code translation effort, hardware cost, power efficiency […]
Feb, 17
Petascale elliptic solvers for anisotropic PDEs on GPU clusters
Memory bound applications such as solvers for large sparse systems of equations remain a challenge for GPUs. Fast solvers should be based on numerically efficient algorithms and implemented such that global memory access is minimised. To solve systems with up to one trillion (10^12) unknowns the code has to make efficient use of several million […]
Feb, 17
GPU Programming with CUDA: A brief overview
In this paper we describe the architecture of a NVIDIA GPU, as well as the CUDA programming model. The basic statements are explained. We also provide an example of CUDA code, explaining its execution workflow in a GPU device.
Feb, 17
Optimizing Performance of Stencil Code with SPL Conqueror
A standard technique to numerically solve elliptic partial differential equations on structured grids is to discretize them via finite differences and then to apply an efficient geometric multi-grid solver. Unfortunately, finding the optimal choice of multi-grid components and parameters is challenging and platform dependent, especially, in cases where domain knowledge is incomplete. Auto-tuning is a […]
Feb, 17
Interactive Design Exploration for Constrained Meshes
In architectural design, surface shapes are commonly subject to geometric constraints imposed by material, fabrication or assembly. Rationalization algorithms can convert a freeform design into a form feasible for production, but often require design modifications that might not comply with the design intent. In addition, they only offer limited support for exploring alternative feasible shapes, […]
Feb, 17
Efficient pseudo-random number generation for monte-carlo simulations using graphic processors
A hybrid approach based on the combination of three Tausworthe generators and one linear congruential generator for pseudo random number generation for GPU programing as suggested in NVIDIA-CUDA library has been used for MONTE-CARLO sampling. On each GPU thread, a random seed is generated on fly in a simple way using the quick and dirty […]
Feb, 17
Resolution of Linear Algebra for the Discrete Logarithm Problem using GPU and Multi-core Architectures
In cryptanalysis, solving the discrete logarithm problem (DLP) is key to assessing the security of many public-key cryptosystems. The index-calculus methods, that attack the DLP in multiplicative subgroups of finite fields, require solving large sparse systems of linear equations modulo large primes. This article deals with how we can run this computation on GPU- and […]
Feb, 17
Fast American Basket Option Pricing on a multi-GPU Cluster
This article presents a multi-GPU adaptation of a specific Monte Carlo and classification based method for pricing American basket options, due to Picazo. The first part relates how to combine fine and coarse-grained parallelization to price American basket options. A dynamic strategy of kernel calibration is proposed. Doing so, our implementation on a reasonable size […]
Feb, 16
Towards Porting a Real-World Seismological Application to the Intel MIC Architecture
This whitepaper aims to discuss first experiences with porting an MPI-based real-world geophysical application to the new Intel Many Integrated Core (MIC) architecture. The selected code SeisSol is an application written in Fortran that can be used to simulate earthquake rupture and radiating seismic wave propagation in complex 3-D heterogeneous materials. The PRACE prototype cluster […]
Feb, 16
Direct Numerical Simulation and Large Eddy Simulation on a Turbulent Wall-Bounded Flow Using Lattice Boltzmann Method and Multiple GPUs
Direct numerical simulation (DNS) and large eddy simulation (LES) were performed on the wall-bounded flow at Re_tau = 180 using lattice Boltzmann method (LBM) and multiple Graphic Processing Units (GPUs). In the DNS, 8 K20M GPUs were adopted. The maximum number of meshes is 6.7×10^7, which results in the non-dimensional mesh size of Delta+=1.41 for […]
Feb, 16
Cuda K-Nn: application to the segmentation of the retinal vasculature within SD-OCT volumes of mice
In this work, a speed comparison between GPU-based CUDA k-NN implementation and the ANN implementation has been tested on three sets of medical imaging data. The results show that with higher dimensional data, CUDA-based k-NN approach could have up to two orders of magnitude of speed up. Otherwise, ANN would be a better implementation to […]