13210

Posts

Dec, 2

An Approach for Maximizing Performance on Heterogeneous Clusters of CPU and GPU

Over the past years there has been significant enthusiasm for development of parallel computing on Graphics Processing Units (GPU) which have now become powerful and affordable hardware equipping data centers and research clusters. Our earlier research has explored the ways to exploit the parallel compute performance of the GPU along the CPU in the same […]
Dec, 2

Scalability and Optimization Strategies for GPU Enhanced Neural Networks (GeNN)

Simulation of spiking neural networks has been traditionally done on high-performance supercomputers or large-scale clusters. Utilizing the parallel nature of neural network computation algorithms, GeNN (GPU Enhanced Neural Network) provides a simulation environment that performs on General Purpose NVIDIA GPUs with a code generation based approach. GeNN allows the users to design and simulate neural […]
Dec, 2

GPU accelerated feature algorithms for mobile devices

Mobile devices offer many new avenues for computer vision and in particular mobile augmented reality applications that have not been feasible with desktop computers. The motivation for this research is to improve mobile augmented reality applications so that natural features, instead of fiducial markers or pure location knowledge, can be used as anchor points for […]
Dec, 1

An Open-Source GPU-Accelerated Feature Extraction Tool

An extraction of feature-vectors from speech audio signal is a computationally intensive task. However, MFCC and PLP features remain the most popular for more than a decade. We made a GPU-accelerated implementation of the feature extraction processing. The implementation produces identical features as the reference Hidden Markov Toolkit (HTK) but in a fraction of the […]
Dec, 1

GPU Declarative Framework

This dissertation presents our novel declarative framework, called the Declarative Framework for GPUs (DEFG). GPUs are highly sophisticated computing devices, capable of computing at very high speeds. The framework makes the development of OpenCL-based GPU applications less complex, and less time consuming. The framework’s approach is two-fold. First, we developed the DEFG domain-specific language in […]
Dec, 1

Application Synthesis and Optimization on Heterogeneous Parallel Processing Systems

Recently, a hybrid system consisting of general-purpose processors (CPU) and accelerators such as graphic processing units (GPUs) have become mainstream system architecture design for achieving high performance and power efficiency. However, this growing trend is forcing programmers to address issues and challenges in adapting legacy serial programs into heterogeneous parallel programs. To alleviate the burden […]
Dec, 1

Performance Analysis of a High-level Abstractions-based Hydrocode on Future Computing Systems

In this paper we present research on applying a domain specific high-level abstractions (HLA) development strategy with the aim to "future-proof" a key class of high performance computing (HPC) applications that simulate hydrodynamics computations at AWE plc. We build on an existing high-level abstraction framework, OPS, that is being developed for the solution of multi-block […]
Dec, 1

pyMIC: A Python Offload Module for the Intel Xeon Phi Coprocessor

Python has gained a lot of attention by the high performance computing community as an easy-to-use, elegant scripting language for rapid prototyping and development of flexible software. At the same time, there is an ever-growing need for more compute power to satisfy the demand for higher accuracy simulation or more detailed modeling. The Intel Xeon […]
Nov, 29

A Framework for Composing High-Performance OpenCL from Python Descriptions

Parallel processors have become ubiquitous; most programmers today have access to parallel hardware such as multi-core processors and graphics processors. This has created an implementation gap, where efficiency programmers with knowledge of hardware details can attain high performance by exploiting parallel hardware, while productivity programmers with application-level knowledge may not understand low-level performance trade-offs. Ideally, […]
Nov, 29

Code Generation Compiler for the OpenMP 4.0 Accelerator Model onto OMPSS

The aim of OpenMP which is a well known shared memory programming API, is using shared memory multiprocessor programming with pragma directives easily. Up till now, its interface consisted of task and iteration level parallelism for general purpose CPU. However OpenMP includes in its latest 4.0 specification the accelerator model. OmpSs is an OpenMP extended […]
Nov, 29

A CUDA implementation of the High Performance Conjugate Gradient benchmark

The High Performance Conjugate Gradient (HPCG) benchmark has been recently proposed as a complement to the High Performance Linpack (HPL) benchmark currently used to rank supercomputers in the Top500 list. This new benchmark solves a large sparse linear system using a multigrid preconditioned conjugate gradient (PCG) algorithm. The PCG algorithm contains the computational and communication […]
Nov, 29

Runtime Comparison of CPU and GPU Using Portable Programming Models

Since increasing clock speeds are not enough to speed up computation, there exist several alternative options. One of them is parallelism. For some problems it is possible to use the graphics processor as a massive parallel system and gain high speedups. Since NVIDIA introduced the unified device architecture and AMD switched to the OpenCL programming […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: