high performance computing on graphics processing units: hgpu.org

Posts

Apr, 22

Preliminary implementation of two parallel programs for fractal image coding on GPUs

GPGPU (General Purpose computing on Graphic Processing Unit) attracts a great deal of attention, which is used for general-purpose computations like numerical calculations as well as graphic processing. In this paper, we implement Fractal image coding algorithms on GPUs by using CUDA (Compute Unified Device Architecture) and evaluate the effectiveness of the shared memory using […]

CUDA

Apr, 22

Parallel Zigzag Scanning and Huffman Coding for a GPU-based MPEG-2 Encoder

GPUs excel in parallel computations, so they are very efficient calculating the discrete cosine transform of spatial domain images, as required for video encoding. The last steps of MPEG-2 compression, however, are inherently sequential since they require a serial processing of the resulting DCT coefficients. As that can easily become a bottleneck in GPUbased video […]

Apr, 22

Scalable Clustering Using Graphics Processors

We present new algorithms for scalable clustering using graphics processors. Our basic approach is based on k-means. By changing the order of determining object labels, and exploiting the high computational power and pipeline of graphics processing units (GPUs) for distance computing and comparison, we speed up the k-means algorithm substantially. We introduce two strategies for […]

OpenGL

Apr, 22

Accelerating the numerical simulation of magnetic field lines in tokamaks using the GPU

trip3d is a field line simulation code that numerically integrates a set of nonlinear magnetic field line differential equations. The code is used to study properties of magnetic islands and stochastic or chaotic field line topologies that are important for designing non-axisymmetric magnetic perturbation coils for controlling plasma instabilities in future machines. The code is […]

CUDA

Apr, 22

GPU accelerated simulations of 3D deterministic particle transport using discrete ordinates method

Graphics Processing Unit (GPU), originally developed for real-time, high-definition 3D graphics in computer games, now provides great faculty in solving scientific applications. The basis of particle transport simulation is the time-dependent, multi-group, inhomogeneous Boltzmann transport equation. The numerical solution to the Boltzmann equation involves the discrete ordinates (Sn) method and the procedure of source iteration. […]

CUDA

Apr, 22

AMD Fusion Developer Summit 2011, AFDS 2011

Heterogeneous computing is moving into the mainstream, and a broader range of applications are already on the way. As the provider of world-class CPUs, GPUs, and APUs, AMD offers unique insight into these technologies and how they interoperate. Attend the AMD Fusion Developer Summit to learn about the opportunities that lie ahead.

OpenCL

Apr, 21

Pretty Good Accuracy in Matrix Multiplication with GPUs

With systems such as Road Runner, there is a trend in super computing to offload parallel tasks to special purpose co-processors, composed of many relatively simple scalar processors. The cheaper commodity class equivalent of such a processor would be the graphics card, potentially offering super computer power within the confines of a desktop PC. Graphics […]

CUDA

Apr, 21

Using graphics processors to accelerate the computation of the matrix inverse

We study the use of massively parallel architectures for computing a matrix inverse. Two different algorithms are reviewed, the traditional approach based on Gaussian elimination and the Gauss-Jordan elimination alternative, and several high performance implementations are presented and evaluated. The target architecture is a current general-purpose multicore processor (CPU) connected to a graphics processor (GPU). […]

Apr, 21

Fast Variable Center-Biased Windowing for High-Speed Stereo on Programmable Graphics Hardware

We present a high-speed dense stereo algorithm that achieves both good quality results and very high disparity estimation throughput on the graphics processing unit (GPU). The key idea is a variable center-biased windowing approach, enabling an adaptive selection of the most suitable support patterns with varying sizes and shapes. As the fundamental construct for variable […]

Apr, 21

Automatically generating and tuning GPU code for sparse matrix-vector multiplication from a high-level representation

We propose a system-independent representation of sparse matrix formats that allows a compiler to generate efficient, system-specific code for sparse matrix operations. To show the viability of such a representation we have developed a compiler that generates and tunes code for sparse matrix-vector multiplication (SpMV) on GPUs. We evaluate our framework on six state-of-the-art matrix […]

CUDA

•

OpenCL

Apr, 21

Automatically Tuning Sparse Matrix-Vector Multiplication for GPU Architectures

Graphics processors are increasingly used in scientific applications due to their high computational power, which comes from hardware with multiple-level parallelism and memory hierarchy. Sparse matrix computations frequently arise in scientific applications, for example, when solving PDEs on unstructured grids. However, traditional sparse matrix algorithms are difficult to efficiently parallelize for GPUs due to irregular […]

Apr, 21

Assessment of GPU computational enhancement to a 2D flood model

This paper presents a study of the computational enhancement of a Graphics Processing Unit (GPU) enabled 2D flood model. The objectives are to demonstrate the significant speedup of a new GPU-enabled full dynamic wave flood model and to present the effect of model spatial resolution on its speedup. A 2D dynamic flood model based on […]

CUDA