17771

Posts

Nov, 12

GPU computing and Many Integrated Core Computing (PDP), 2018

TOPICS: * GPU computing, multi GPU processing, hybrid computing * Programming models, programming frameworks, CUDA, OpenCL, communication libraries * Mechanisms for mapping codes * Task allocation * Fault tolerance * Performance analysis * Many Integrated Core architecture, MIC * Intel coprocessor, Xeon Phi * Vectorization * Applications: image processing, signal processing, linear algebra, numerical simulation, […]
Nov, 12

Scalable and massively parallel Monte Carlo photon transport simulations for heterogeneous computing platforms

We present a highly scalable Monte Carlo (MC) 3D photon transport simulation platform designed for heterogeneous computing systems. By developing a massively parallel MC algorithm using the OpenCL framework, this research extends our existing GPU-accelerated MC technique to a highly-scalable vendor-independent heterogeneous computing environment, achieving significantly improved performance and software portability. A number of parallel […]
Nov, 12

Low-power System-on-Chip Processors for Energy Efficient High Performance Computing: The Texas Instruments Keystone II

The High Performance Computing (HPC) community recognizes energy consumption as a major problem. Extensive research is underway to identify means to increase energy efficiency of HPC systems including consideration of alternative building blocks for future systems. This thesis considers one such system, the Texas Instruments Keystone II, a heterogeneous Low-Power System-on-Chip (LPSoC) processor that combines […]
Nov, 7

Radeon PRO Solid State Graphics (SSG) API User Manual

The Radeon Pro SSG software library enables peer-to-peer (P2P) data transfers between GPU and Radeon on board SSD devices. It allows a methodology to read OS file data from SSDs to OpenCL, OpenGL and DirectX buffers with very low-latency P2P communication. The development kit version of this library supports only the Microsoft Windows 10 operating […]
Nov, 5

Data Coherence Analysis and Optimization for Heterogeneous Computing

Although heterogeneous computing has enabled impressive program speed-ups, knowledge about the architecture of the target device is still critical to reap full hardware benefits. Programming such architectures is complex and is usually done by means of specialized languages (e.g. CUDA, OpenCL). The cost of moving and keeping host/device data coherent may easily eliminate any performance […]
Oct, 31

Automatic Scan Parallelization in OpenMP

Prefix Scan (or simply scan) is an operator that computes all the partial sums of a vector. A scan operation results in a vector where each element is the sum of the preceding elements in the original vector up to the corresponding position. Scan is a key operation in many relevant problems like sorting, lexical […]
Oct, 29

A Study of Time and Energy Efficient Algorithms for Parallel and Heterogeneous Computing

This PhD project is motivated by the need to develop and achieve better and energy efficient computing through the use of parallelism and heterogeneous systems. Our contribution consists of both theoretical aspects, as well as in-depth and comprehensive empirical studies that aim to provide more insight into parallel and heterogeneous computing. Our first problem is […]
Oct, 24

A Fast and Generic GPU-Based Parallel Reduction Implementation

Reduction operations are extensively employed in many computational problems. A reduction consists of, given a finite set of numeric elements, combining into a single value all elements in that set, using for this a combiner function. A parallel reduction, in turn, is the reduction operation concurrently performed when multiple execution units are available. The current […]
Oct, 24

Parallel Computing for the Inverse of SPD matrix

In this paper, we propose a High performance Parallel Computing method for the Inverse of a symmetric positive definite (SPD) matrix. Brought in the reuse of the inverse of diagonal sub blocks technique and Combined with the newest OpenCL parallel computing framework, this methods can improve computing the inverse of SPD matrix effectively. Computing the […]
Oct, 21

How to distribute most efficiently a computation intensive calculation on an Android device to external compute units with an Android API

Is transferring computation intensive calculations to external compute-units the next trend? This master’s thesis researches if it is worth the effort to transfer a matrix multiplication from an Android phone to a System-on-Chip (SoC), using Bluetooth or WebSocket as communication protocols. The SoC solution used in this work is an Intel Altera Cyclone V based […]
Oct, 3

FPGA implementation of a Convolutional Neural Network for "Wake up word" detection

The popularity of machine learning has increased dramatically in the last years and the possible applications varies from web search, speech recognition, object detection, etc. A big part of this development is due to the use of Convolutional Neural Networks (CNNs), where high performance Graphics Processing Units (GPUs) has been the most popular device. This […]
Oct, 3

Computing Treewidth on the GPU

We present a parallel algorithm for computing the treewidth of a graph on a GPU. We implement this algorithm in OpenCL, and experimentally evaluate its performance. Our algorithm is based on an O*(2^n)-time algorithm that explores the elimination orderings of the graph using a Held-Karp like dynamic programming approach. We use Bloom filters to detect […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: