high performance computing on graphics processing units: hgpu.org

Posts

Jan, 10

Generating, Optimizing, and Scheduling a Compiler Level Representation of Stream Parallelism

Stream parallelism is often cited as a powerful programming model for expressing parallel computation for multi-core and heterogeneous computers. It allows programmers to concisely describe the concurrency and communication requirements found in a program and it allows compilers and runtime systems to easily generate efficient code targeting parallel hardware. This type of stream parallelism is […]

OpenCL

Jan, 10

Graphics Processor Unit (GPU) Acceleration of Finite-Difference Frequency-Domain (FDFD) Method

Recently, many numerical methods that are developed for the solution of electromagnetic problems have greatly benefited from the hardware accelerated scientific computing capability provided by graphics processing units (GPUs) and orders of magnitude speed-up factors have been reported. Among these methods, the finite-difference frequency-domain (FDFD) method as well can be accelerated substantially by utilizing an […]

CUDA

Jan, 10

A Hybrid Circular Queue Method for Iterative Stencil Computations on GPUs

In this paper, we present a hybrid circular queue method that can significantly boost the performance of stencil computations on GPU by carefully balancing usage of registers and shared-memory. Unlike earlier methods that rely on circular queues predominantly implemented using indirectly addressable shared memory, our hybrid method exploits a new reuse pattern spanning across the […]

CUDA

Jan, 10

Parallelizing Kernel Polynomial Method Applying Graphics Processing Units

The Kernel Polynomial Method (KPM) is one of the fast diagonalization methods used for simulations of quantum systems in research fields of condensed matter physics and chemistry. The algorithm has a difficulty to be parallelized on a cluster computer or a supercomputer due to the fine-grain recursive calculations. This paper proposes an implementation of the […]

CUDA

Jan, 10

Acceleration of AES encryption on CUDA GPU

GPU exhibits the capability for applications with a high level of parallelism despite its low cost. The support of integer and logical instructions by the latest generation of GPUs enables us to implement cipher algorithms more easily. However, decisions such as parallel processing granularity and memory allocation impose a heavy burden on programmers. Therefore, this […]

CUDA

Jan, 9

Providing Source Code Level Portability Between CPU and GPU with MapCG

Graphics processing units (GPU) have taken an important role in the general purpose computing market in recent years. At present, the common approach to programming GPU units is to write GPU specific code with low level GPU APIs such as CUDA. Although this approach can achieve good performance, it creates serious portability issues as programmers […]

CUDA

•

OpenCL

Jan, 9

Gauge Fixing in Lattice QCD on GPUs

Quantum Chromodynamics (QCD) [1, 2] is the theory of the strong interaction which is responsible for the hadron spectrum and therefore for all matter in our everyday life. QCD, being a quantum field theory and part of the standard model of elementary particles, describes the interactions between color-charged quarks and gluons. Hadrons, e.g., protons, neutrons […]

CUDA

Jan, 9

A new parallel tool for classification of remotely sensed imagery

In this paper, we describe a new tool for classification of remotely sensed images. Our processing chain is based on three main parts: (1) pre-processing, performed using morphological profiles which model both the spatial (high resolution) and the spectral (color) information available from the scenes; (2) classification, which can be performed in unsupervised fashion using […]

CUDA

Jan, 9

Top-k Queries Processing With Uncertain Data on Graphics Processing Units

Considering the complex uncertain database, top-k query processing in uncertain databases is semantically and computationally different from classical top-k processing. Score is not the only factor we should concern. The interplay between score and membership uncertainty makes computation complex. Powerful computing capability of Graphic Processing Unit(GPU) is needed in the processing of this kind of […]

CUDA

Jan, 9

Designing Numerical Solvers for Next Generation High Performance Computing

High Performance Computing (HPC) is moving towards massive scales of parallelism. The changes in hardware towards large scale on chip parallelism requires the re-writing of existing solvers for various Computational Fluid Dynamics (CFD) problems. The aim of the project is to write and optimise novel solvers for various common CFD numerical problems that can take […]

CUDA

Jan, 9

LU Factorization for Accelerator-based Systems

Multicore architectures enhanced with multiple GPUs are likely to become mainstream High Performance Computing (HPC) platforms in a near future. In this paper, we present the design and implementation of an LU factorization using tile algorithm that can fully exploit the potential of such platforms in spite of their complexity. We use a methodology derived […]

CUDA

Jan, 9

Neural Network Simulation: The recognition application

This paper presents the GPU mapping of the recognition algorithm of a Convolution Neural Network (CNN). This work is based on a C-implementation of the application. The mapping to GPU was performed through different approaches which are explained in detail. The improvements achieved by each approach are presented as well as the overall speed up […]

CUDA