Posts
Nov, 24
Real-time Building Airflow Simulation Aided by GPU and FFD
Two recent methods for the fast simulation of the building airflow are studied: the fast fluid dynamics (FFD) algorithm and the use of graphic processing unit (GPU) for scientific computing in building engineering. A GOOGLE SketchUp plug-in for the FFD program was also developed as a model-creating tool to enhance the accessibility of the operation […]
Nov, 23
LoGV: Low-overhead GPGPU Virtualization
Over the last few years, running high performance computing applications in the cloud has become feasible. At the same time, GPGPUs are delivering unprecedented performance for HPC applications. Cloud providers thus face the challenge to integrate GPGPUs into their virtualized platforms, which has proven difficult for current virtualization stacks. In this paper, we present LoGV, […]
Nov, 23
TESLA GPUs versus MPI with OpenMP for the Forward Modeling of Gravity and Gravity Gradient of Large Prisms Ensemble
An implementation with the CUDA technology in a single and in several graphics processing units (GPUs) is presented for the calculation of the forward modeling of gravitational fields from a tridimensional volumetric ensemble composed by unitary prisms of constant density. We compared the performance results obtained with the GPUs against a previous version coded in […]
Nov, 23
Graph grammar based multi-frontal direct solver for isogeometric FEM simulations on GPU
We present a multi-frontal direct solver for two dimensional isogeometric finite element method simulations with NVIDIA CUDA and perform numerical experiments for linear, quadratic and cubic B-splines. We compare the computational cost O(Np^2) for 2D parallel shared memory implementation with the corresponding estimate O(N^1.5p^3) for a standard 2D sequential implementation. We conclude the presentation with […]
Nov, 23
Fast 4pi track reconstruction in nuclear emulsion detectors based on GPU technology
Fast 4pi solid angle particle track recognition has been a challenge in particle physics for a long time, especially in using nuclear emulsion detectors. The recent advances in computing technology opened the way for its realization. A fast 4pi solid angle particle track reconstruction based on GPU technology combined with a multithread programming is reported […]
Nov, 23
Dynamic Partitioning-based JPEG Decompression on Heterogeneous Multicore Architectures
With the emergence of social networks and improvements in computational photography, billions of JPEG images are shared and viewed on a daily basis. Desktops, tablets and smartphones constitute the vast majority of hardware platforms used for displaying JPEG images. Despite the fact that these platforms are heterogeneous multicores, no approach exists yet that is capable […]
Nov, 22
Accelerating Sequential Computer Vision Algorithms Using Commodity Parallel Hardware
Since 2004, the clock frequency of CPUs has not increased significantly. Computer Vision applications have an increasing demand for more processing power and are limited by the performance capabilities of sequential processor architectures. The only way to get better performance using commodity hardware is to adopt parallel programming. Many other related research projects have considered […]
Nov, 22
An improved parallel contrast-aware halftoning
Digital image halftoning is a widely used technique. However, achieving high fidelity tone reproduction and structural preservation with low computational time-cost remains a challenging problem. This paper presents a highly parallel algorithm to boost the real-time application of the serial structure-preserving error diffusion. The contrast-aware halftoning approach is one such technique with superior structure preservation, […]
Nov, 22
An Optimal Offline Permutation Algorithm on the Hierarchical Memory Machine, with the GPU implementation
The Hierarchical Memory Machine (HMM) is a theoretical parallel computing model that captures the essence of computation on CUDA-enabled GPUs. The offline permutation is a task to copy numbers stored in an array a of size n to an array b of the same size along a permutation P given in advance. A conventional algorithm […]
Nov, 22
Optimization of the Oktay-Kronfeld Action Conjugate Gradient Inverter
Improving the Fermilab action to third order in heavy quark effective theory yields the Oktay-Kronfeld action, a promising candidate for precise calculations of the spectra of heavy quark systems and weak matrix elements relevant to searches for new physics. We have optimized the bi-stabilized conjugate gradient inverter in the SciDAC QOPQDP library and are developing […]
Nov, 22
Bohrium: Unmodified NumPy Code on CPU, GPU, and Cluster
In this paper we introduce Bohrium, a runtime-system for mapping array-operations onto a number of different hardware platforms, from multi-core systems to clusters and GPU enabled systems. As a result, the Bohrium runtime system enables NumPy code to utilize CPU, GPU, and Clusters. Bohrium integrates seamlessly into NumPy through the implicit data parallelization of array […]
Nov, 21
Experience with Intel’s Many Integrated Core architecture in ATLAS software
Intel recently released the first commercial boards of its Many Integrated Core (MIC) Architecture. MIC is Intel’s solution for the domain of throughput computing, currently dominated by general purpose programming on graphics processors (GPGPU). MIC allows the use of the more familiar x86 programming model and supports standard technologies such as OpenMP, MPI, and Intel’s […]