high performance computing on graphics processing units: hgpu.org

Posts

Dec, 20

An Automatic Host and Device Memory Allocation Method for OpenMPC

The CUDA programming model provides better abstraction for GPU programming. However, it is still hard to write programs with CUDA because both some specific techniques and knowledge about GPU architecture is required. Hence, many programming frameworks for CUDA have been developed. OpenMPC is one of them based on OpenMP. OpenMPC s an easy-to-write framework for […]

CUDA

Dec, 20

A Parallel Preconditioned Bi-Conjugate Gradient Stabilized Solver for the Poisson Problem

We present a parallel Preconditioned Bi-Conjugate Gradient Stabilized(BICGstab) solver for the Poisson problem. Given a real, nosymmetric and positive definite coefficient matrix, the parallized Preconditioned BICGstab – solver is able to find a solution for that system by exploiting the massive compute power of todays GPUs. Comparing sequential CPU implementations and that algorithm.we achieve a […]

CUDA

Dec, 20

IceCubes GPGPU’s cluster for extensive MC production

GPGPU computing offers extraordinary increases in pure processing power for parallelizable applications. In IceCube we use GPUs for ray-tracing of cherenkov photons in the Antarctic ice as part of detector simulation. We report on how we implemented the mixed simulation production chain to include the processing on the GPGPU cluster for the IceCube Monte-Carlo production. […]

CUDA

Dec, 20

Interactive Bi-scale Editing of Highly Glossy Materials

We present a new technique for bi-scale material editing using Spherical Gaussians (SGs). To represent large-scale appearances, an effective BRDF that is the average reflectance of small-scale details is used. The effective BRDF is calculated from the integral of the product of the Bidirectional Visible Normal Distribution (BVNDF) and BRDFs of small-scale geometry. Our method […]

OpenGL

Dec, 20

Sparse Matrix Multiplication using CUDA and Mex Interface

In recent years, the development in the architecture of graphics processing units (GPUs) has revolutionized the area of high performance computing by offering massive parallelism and performance improvements in many applications including matrix algebra. While it is possible to harness the power of GPUs for dense matrix computations, sparse matrix computations are still complex since […]

CUDA

Dec, 18

Memory-Efficient Single-Pass GPU Rendering of Multi-fragment Effects

Rendering multi-fragment effects using GPUs is attractive for high speed. However, the efficiency is seriously compromised, because ordering fragments on GPUs is not easy and the GPU’s memory may not be large enough to store the whole scene geometry. Hitherto, existing methods have been unsuitable for large models or have required many passes for data […]

OpenGL

Dec, 18

Face Detection with Improved Local Binary Patterns in CUDA

As mobile computing and user interactivity become more ubiquitous, accurate and fast facial detection mechanisms are necessary. And with the development of accessible parallel computing, it becomes possible to leverage the power of parallel algorithms to increase both speed and accuracy of facial detection systems. In this paper, we propose and analyse one such system […]

CUDA

Dec, 18

Implementation of Stereo Matching Using High Level Compiler for Parallel Computing Acceleration

Heterogeneous computing system increases the performance of parallel computing in many domain of general purpose computing with CPU, GPU and other accelerators. With Hardware developments, the software developments like Compute Unified Device Architecture(CUDA) and Open Computing Language (OpenCL) try to offer a simple and visualized tool for parallel computing. But it turn out to be […]

CUDA

•

OpenCL

Dec, 18

Parallelisation of Shallow Water Simulation for Heterogeneous Architectures

This work presents the parallelisation of a shallow water simulation model. Two parallel implementations are developed. One is for a multi-core NUMA architecture, developed in OpenMP. The other one is for a many-core GPU-accelerated architecture and is developed in OpenCL. The parallelisation process is based on an iterative approach, starting off from a naive implementation. […]

CUDA

•

OpenCL

Dec, 18

Alternating Maximization: Unifying Framework for 8 Sparse PCA Formulations and Efficient Parallel Codes

Given a multivariate data set, sparse principal component analysis (SPCA) aims to extract several linear combinations of the variables that together explain the variance in the data as much as possible, while controlling the number of nonzero loadings in these combinations. In this paper we consider 8 different optimization formulations for computing a single sparse […]

CUDA

Dec, 18

Improved FCM algorithm for Clustering on Web Usage Mining

In this paper we present clustering method is very sensitive to the initial center values, requirements on the data set too high, and cannot handle noisy data the proposal method is using information entropy to initialize the cluster centers and introduce weighting parameters to adjust the location of cluster centers and noise problems. The navigation […]

Dec, 18

Implementation of 3D FFTs Across Multiple GPUs in Shared Memory Environments

In this paper, a novel implementation of the distributed 3D Fast Fourier Transform (FFT) on a multi-GPU platform using CUDA is presented. The 3D FFT is the core of many simulation methods, thus its fast calculation is critical. The main bottleneck of the distributed 3D FFT is the global data exchange which must be performed. […]

CUDA