Posts
Dec, 20
An Automatic Host and Device Memory Allocation Method for OpenMPC
The CUDA programming model provides better abstraction for GPU programming. However, it is still hard to write programs with CUDA because both some specific techniques and knowledge about GPU architecture is required. Hence, many programming frameworks for CUDA have been developed. OpenMPC is one of them based on OpenMP. OpenMPC s an easy-to-write framework for […]
Dec, 20
A Parallel Preconditioned Bi-Conjugate Gradient Stabilized Solver for the Poisson Problem
We present a parallel Preconditioned Bi-Conjugate Gradient Stabilized(BICGstab) solver for the Poisson problem. Given a real, nosymmetric and positive definite coefficient matrix, the parallized Preconditioned BICGstab – solver is able to find a solution for that system by exploiting the massive compute power of todays GPUs. Comparing sequential CPU implementations and that algorithm.we achieve a […]
Dec, 20
IceCubes GPGPU’s cluster for extensive MC production
GPGPU computing offers extraordinary increases in pure processing power for parallelizable applications. In IceCube we use GPUs for ray-tracing of cherenkov photons in the Antarctic ice as part of detector simulation. We report on how we implemented the mixed simulation production chain to include the processing on the GPGPU cluster for the IceCube Monte-Carlo production. […]
Dec, 20
Interactive Bi-scale Editing of Highly Glossy Materials
We present a new technique for bi-scale material editing using Spherical Gaussians (SGs). To represent large-scale appearances, an effective BRDF that is the average reflectance of small-scale details is used. The effective BRDF is calculated from the integral of the product of the Bidirectional Visible Normal Distribution (BVNDF) and BRDFs of small-scale geometry. Our method […]
Dec, 20
Sparse Matrix Multiplication using CUDA and Mex Interface
In recent years, the development in the architecture of graphics processing units (GPUs) has revolutionized the area of high performance computing by offering massive parallelism and performance improvements in many applications including matrix algebra. While it is possible to harness the power of GPUs for dense matrix computations, sparse matrix computations are still complex since […]
Dec, 18
Memory-Efficient Single-Pass GPU Rendering of Multi-fragment Effects
Rendering multi-fragment effects using GPUs is attractive for high speed. However, the efficiency is seriously compromised, because ordering fragments on GPUs is not easy and the GPU’s memory may not be large enough to store the whole scene geometry. Hitherto, existing methods have been unsuitable for large models or have required many passes for data […]
Dec, 18
Face Detection with Improved Local Binary Patterns in CUDA
As mobile computing and user interactivity become more ubiquitous, accurate and fast facial detection mechanisms are necessary. And with the development of accessible parallel computing, it becomes possible to leverage the power of parallel algorithms to increase both speed and accuracy of facial detection systems. In this paper, we propose and analyse one such system […]
Dec, 18
Implementation of Stereo Matching Using High Level Compiler for Parallel Computing Acceleration
Heterogeneous computing system increases the performance of parallel computing in many domain of general purpose computing with CPU, GPU and other accelerators. With Hardware developments, the software developments like Compute Unified Device Architecture(CUDA) and Open Computing Language (OpenCL) try to offer a simple and visualized tool for parallel computing. But it turn out to be […]
Dec, 18
Parallelisation of Shallow Water Simulation for Heterogeneous Architectures
This work presents the parallelisation of a shallow water simulation model. Two parallel implementations are developed. One is for a multi-core NUMA architecture, developed in OpenMP. The other one is for a many-core GPU-accelerated architecture and is developed in OpenCL. The parallelisation process is based on an iterative approach, starting off from a naive implementation. […]
Dec, 18
Alternating Maximization: Unifying Framework for 8 Sparse PCA Formulations and Efficient Parallel Codes
Given a multivariate data set, sparse principal component analysis (SPCA) aims to extract several linear combinations of the variables that together explain the variance in the data as much as possible, while controlling the number of nonzero loadings in these combinations. In this paper we consider 8 different optimization formulations for computing a single sparse […]
Dec, 18
Improved FCM algorithm for Clustering on Web Usage Mining
In this paper we present clustering method is very sensitive to the initial center values, requirements on the data set too high, and cannot handle noisy data the proposal method is using information entropy to initialize the cluster centers and introduce weighting parameters to adjust the location of cluster centers and noise problems. The navigation […]
Dec, 18
Implementation of 3D FFTs Across Multiple GPUs in Shared Memory Environments
In this paper, a novel implementation of the distributed 3D Fast Fourier Transform (FFT) on a multi-GPU platform using CUDA is presented. The 3D FFT is the core of many simulation methods, thus its fast calculation is critical. The main bottleneck of the distributed 3D FFT is the global data exchange which must be performed. […]