15139

Posts

Dec, 22

A time-energy performance analysis of MapReduce on heterogeneous systems with GPUs

Motivated by the explosion of Big Data analytics, performance improvements in lowpower (wimpy) systems and the increasing energy efficiency of GPUs, this paper presents a time-energy performance analysis of MapReduce on heterogeneous systems with GPUs. We evaluate the time and energy performance of three MapReduce applications with diverse resource demands on a Hadoop-CUDA framework. As […]
Dec, 19

Investigation of the SYCL for OpenCL Programming Model

OpenCL and SYCL for OpenCL are open-standard programming models which enable development of parallel programs which target heterogeneous hardware: systems which contain both general-purpose CPUs and accelerator devices such as GPGPUs or Intel Xeon Phi cards. While OpenCL provides a C API, SYCL provides a C++ API and allows programmers to take advantage of many […]
Dec, 19

Autotuning Stencils Codes with Algorithmic Skeletons

The physical limitations of microprocessor design have forced the industry towards increasingly heterogeneous architectures to extract performance. This trend has not been matched with software tools to cope with such parallelism, leading to a growing disparity between the levels of available performance and the ability for application developers to exploit it. Algorithmic skeletons simplify parallel […]
Dec, 19

Study, Modelling and Implementation of the Level Set Method Used in Micromachining Processes

The main topic of the present thesis is the improvement of fabrication processes simulation by means of the Level Set (LS) method. The LS is a mathematical approach used for evolving fronts according to a motion defined by certain laws. The main advantage of this method is that the front is embedded inside a higher […]
Dec, 19

Challenges Adapting CUDA PIC Codes to multiple GPUs

A Particle-In-Cell code is a common particle simulation method often used to simulate the behaviour of plasma. In this work, a parallel PIC code is developed in CUDA, with a focus on how to adapt the method for multiple GPUs. An electrostatic three dimensional PIC code is developed, with an FFT-based solver using the cuFFT […]
Dec, 19

Efficient Query Processing in Co-Processor-accelerated Databases

Advancements in hardware changed the bottleneck of modern database systems from disk IO to main memory access and processing power. Since the performance of modern processors is primarily limited by a fixed energy budget, hardware vendors are forced to specialize processors. Consequently, processors become increasingly heterogeneous, which already became commodity in the form of accelerated […]
Dec, 15

Origami: A Convolutional Network Accelerator

Today advanced computer vision (CV) systems of ever increasing complexity are being deployed in a growing number of application scenarios with strong real-time and power constraints. Current trends in CV clearly show a rise of neural network-based algorithms, which have recently broken many object detection and localization records. These approaches are very flexible and can […]
Dec, 15

Adaptive algebraic multigrid on SIMD architectures

We present details of our implementation of the Wuppertal adaptive algebraic multigrid code DD-alpha AMG on SIMD architectures, with particular emphasis on the Intel Xeon Phi processor (KNC) used in QPACE 2. As a smoother, the algorithm uses a domain-decomposition-based solver code previously developed for the KNC in Regensburg. We optimized the remaining parts of […]
Dec, 15

A CUDA Kernel Scheduler Exploiting Static Data Dependencies

The CUDA execution model of Nvidia’s GPUs is based on the asynchronous execution of thread blocks, where each thread executes the same kernel in a data-parallel fashion. When threads in different thread blocks need to synchronise and communicate, the whole computation launched onto the GPU needs to be stopped and re-invoked in order to facilitate […]
Dec, 15

Run-time support for multi-level disjoint memory address spaces

High Performance Computing (HPC) systems have become widely used tools in many industry areas and research fields. Research to produce more powerful and efficient systems has grown in par with their popularity. As a consequence, the complexity of modern HPC architectures has increased in order to provide systems with the highest levels of performance. This […]
Dec, 15

Bigger Buffer k-d Trees on Multi-Many-Core Systems

A buffer k-d tree is a k-d tree variant for massively-parallel nearest neighbor search. While providing valuable speed-ups on modern many-core devices in case both a large number of reference and query points are given, buffer k-d trees are limited by the amount of points that can fit on a single device. In this work, […]
Dec, 15

Compressed Dynamic Mode Decomposition for Real-Time Object Detection

We introduce the method of compressive dynamic mode decomposition (cDMD) for robustly performing real-time foreground/background separation in high-definition video. The DMD method provides a regression technique for least-square fitting of video snapshots to a linear dynamical system. The method integrates two of the leading data analysis methods in use today: Fourier transforms and Principal Components. […]
Page 30 of 874« First...1020...2829303132...405060...Last »

* * *

* * *

Follow us on Twitter

HGPU group

1925 peoples are following HGPU @twitter

Like us on Facebook

HGPU group

432 people like HGPU on Facebook

HGPU group © 2010-2016 hgpu.org

All rights belong to the respective authors

Contact us: