high performance computing on graphics processing units: hgpu.org

Posts

Dec, 1

An Open-Source GPU-Accelerated Feature Extraction Tool

An extraction of feature-vectors from speech audio signal is a computationally intensive task. However, MFCC and PLP features remain the most popular for more than a decade. We made a GPU-accelerated implementation of the feature extraction processing. The implementation produces identical features as the reference Hidden Markov Toolkit (HTK) but in a fraction of the […]

CUDA

•

OpenCL

Dec, 1

GPU Declarative Framework

This dissertation presents our novel declarative framework, called the Declarative Framework for GPUs (DEFG). GPUs are highly sophisticated computing devices, capable of computing at very high speeds. The framework makes the development of OpenCL-based GPU applications less complex, and less time consuming. The framework’s approach is two-fold. First, we developed the DEFG domain-specific language in […]

OpenCL

Dec, 1

Application Synthesis and Optimization on Heterogeneous Parallel Processing Systems

Recently, a hybrid system consisting of general-purpose processors (CPU) and accelerators such as graphic processing units (GPUs) have become mainstream system architecture design for achieving high performance and power efficiency. However, this growing trend is forcing programmers to address issues and challenges in adapting legacy serial programs into heterogeneous parallel programs. To alleviate the burden […]

CUDA

•

OpenCL

Dec, 1

Performance Analysis of a High-level Abstractions-based Hydrocode on Future Computing Systems

In this paper we present research on applying a domain specific high-level abstractions (HLA) development strategy with the aim to "future-proof" a key class of high performance computing (HPC) applications that simulate hydrodynamics computations at AWE plc. We build on an existing high-level abstraction framework, OPS, that is being developed for the solution of multi-block […]

CUDA

•

OpenCL

Dec, 1

pyMIC: A Python Offload Module for the Intel Xeon Phi Coprocessor

Python has gained a lot of attention by the high performance computing community as an easy-to-use, elegant scripting language for rapid prototyping and development of flexible software. At the same time, there is an ever-growing need for more compute power to satisfy the demand for higher accuracy simulation or more detailed modeling. The Intel Xeon […]

Nov, 29

A Framework for Composing High-Performance OpenCL from Python Descriptions

Parallel processors have become ubiquitous; most programmers today have access to parallel hardware such as multi-core processors and graphics processors. This has created an implementation gap, where efficiency programmers with knowledge of hardware details can attain high performance by exploiting parallel hardware, while productivity programmers with application-level knowledge may not understand low-level performance trade-offs. Ideally, […]

OpenCL

Nov, 29

Code Generation Compiler for the OpenMP 4.0 Accelerator Model onto OMPSS

The aim of OpenMP which is a well known shared memory programming API, is using shared memory multiprocessor programming with pragma directives easily. Up till now, its interface consisted of task and iteration level parallelism for general purpose CPU. However OpenMP includes in its latest 4.0 specification the accelerator model. OmpSs is an OpenMP extended […]

CUDA

•

OpenCL

Nov, 29

A CUDA implementation of the High Performance Conjugate Gradient benchmark

The High Performance Conjugate Gradient (HPCG) benchmark has been recently proposed as a complement to the High Performance Linpack (HPL) benchmark currently used to rank supercomputers in the Top500 list. This new benchmark solves a large sparse linear system using a multigrid preconditioned conjugate gradient (PCG) algorithm. The PCG algorithm contains the computational and communication […]

CUDA

Nov, 29

Runtime Comparison of CPU and GPU Using Portable Programming Models

Since increasing clock speeds are not enough to speed up computation, there exist several alternative options. One of them is parallelism. For some problems it is possible to use the graphics processor as a massive parallel system and gain high speedups. Since NVIDIA introduced the unified device architecture and AMD switched to the OpenCL programming […]

CUDA

•

OpenCL

Nov, 29

Parallel kNN on GPU Architecture Using OpenCL

In data mining applications, one of the useful algorithms for classification is the kNN algorithm. The kNN search has a wide usage in many research and industrial domains like 3-dimensional object rendering, content-based image retrieval, statistics, biology (gene classification), etc. In spite of some improvements in the last decades, the computation time required by the […]

OpenCL

Nov, 25

4th International Conference on Software and Computer Applications, ICSCA 2015

Submission Deadline: 2015-04-10 Topics: Software Engineering AI and Knowledge based software engineering Artificial Intelligence Aspect-orientation and feature interaction Business Process Reengineering & Science Communication Systems and Networks Component-Based Software Engineering Computer & Software Engineering Computer Animation and Design Contents Computer Game Development, User Modeling and Management Computer supported cooperative work Cost Modeling and Analysis Data […]

Nov, 25

Improving GPU Performance by Regrouping CPU-Memory Data

In order to fast effective analysis of large complex systems, high-performance computing is essential. NVIDIA Compute Unified Device Architecture (CUDA)-assisted central processing unit (CPU) / graphics processing unit (GPU) computing platform has proven its potential to be used in high-performance computing. In CPU/GPU computing, original data and instructions are copied from CPU main memory to […]

CUDA