high performance computing on graphics processing units: hgpu.org

Posts

May, 3

IPMACC: Translating OpenACC API to OpenCL

In this paper, we introduce IPMACC a framework for executing OpenACC for C applications over OpenCL runtime. We use over framework to compare performance of OpenACC and OpenCL. OpenACC API abstractions remove the low-level control from programmers’ hand. To understand the low-level OpenCL optimizations that are not applicable in OpenACC, we compare highly-optimized OpenCL and […]

CUDA

•

OpenCL

May, 3

Efficient Implementation of Bi-directional Path Tracer on GPU

Most of the implementations solving photo-realistic image rendering use standard unidirectional path tracing, having fast and accurate results for scenes without caustics or hard cases. These hard cases are usually solved by a bidirectional path tracing algorithm. However, due to the complexity of the bidirectional path tracing algorithms, its implementations almost exclusively target sequential CPUs. […]

CUDA

May, 3

Fine-Grained Synchronizations and Dataflow Programming on GPUs

The last decade has witnessed the blooming emergence of many-core platforms, especially the graphic processing units (GPUs). With the exponential growth of cores in GPUs, utilizing them efficiently becomes a challenge. The data-parallel programming model assumes a single instruction stream for multiple concurrent threads (SIMT); therefore little support is offered to enforce thread ordering and […]

CUDA

May, 3

Massively Parallel kNN using CUDA on Spam-Classification

Email Spam-classification is a fundamental, unseen element of everyday life. As email communication becomes more prolific, and email systems become more robust, it becomes increasingly necessary for Spam-classification systems to run accurately and efficiently while remaining all but invisible to the user. We propose a massively parallel implementation of Spam-classification using the k-Nearest Neighbors (kNN) […]

CUDA

May, 3

PyTransit: Fast and Easy Exoplanet Transit Modelling in Python

We present a fast and user friendly exoplanet transit light curve modelling package PyTransit, implementing optimised versions of the Gimen’ez and the Mandel & Agol transit models. The package offers an object-oriented Python interface to access the two models implemented natively in Fortran with OpenMP parallelisation. A partial OpenCL version of the quadratic Mandel-Agol model […]

OpenCL

Apr, 27

Parallel Genetic Algorithms on a GPU to Solve the Travelling Salesman Problem

The implementation of parallel genetic algorithms on a graphic processor GPU to solve the Travelling Salesman Problem instances is presented. Two versions of parallel genetic algorithms are implemented, a Parallel Genetic Algorithm with Islands Model and a Parallel Genetic Algorithm with Elite Island; the two versions were executed on a GPU. In both cases, each […]

CUDA

Apr, 27

Speculative Segmented Sum for Sparse Matrix-Vector Multiplication on Heterogeneous Processors

Sparse matrix-vector multiplication (SpMV) is a central building block for scientific software and graph applications. Recently, heterogeneous processors composed of different types of cores attracted much attention because of their flexible core configuration and high energy efficiency. In this paper, we propose a compressed sparse row (CSR) format based SpMV algorithm utilizing both types of […]

CUDA

•

OpenCL

Apr, 27

GPU Accelerated framework for financial nested simulations

In this thesis we present a state-of-the-art approach to accelerate Monte Carlo valuations of embedded options. Due to regulations and improved risk management, nested simulations (scenarios in scenarios) are becoming increasingly important for institutional investors like: insurance companies, pension funds and housing corporations. Preferably one wishes to use a framework in which multiple related problems […]

CUDA

Apr, 27

Parallel local search on GPU and CPU with OpenCL

Real-world optimization problems are very complex and NP-hard. The modeling of such problems is in constant evolution in term of constraints and objectives and their resolution is expensive in computation time. With all this change, even metaheuristics, well known for their efficiency, begin to be overtaken by data explosion. Recently, Thanks to the publication of […]

OpenCL

Apr, 27

Implementation and performance analysis of the AXPY, DOT, and SpMV functions on Intel Xeon Phi and NVIDIA Tesla using OpenCL

The present work is an analysis of the performance of the AXPY, DOT and SpMV functions using OpenCL. The code was tested on the NVIDIA Tesla S2050 GPU and Intel Xeon Phi 3120A coprocessor. Due to nature of the AXPY function, only two versions were implemented, the routine to be executed by the CPU and […]

OpenCL

Apr, 25

Algorithm 9xx: Sparse QR Factorization on the GPU

Sparse matrix factorization involves a mix of regular and irregular computation, which is a particular challenge when trying to obtain high-performance on the highly parallel general-purpose computing cores available on graphics processing units (GPUs). We present a sparse multifrontal QR factorization method that meets this challenge, and is up to eleven times faster than a […]

CUDA

Apr, 25

A Study of Scheduling a Neuro-imaging Application On a Heterogeneous CPU-GPU Cluster

The ever increasing complexity of scientific applications has led to utilization of new HPC paradigms such as Graphical Processing Units (GPUs). However, modifying applications to run on GPU is challenging. Furthermore, the speedup achieved by using GPUs has added a huge heterogeneity to HPC clusters. In this dissertation, we enabled NPAIRS, a neuro-imaging application, to […]

CUDA