14764

Posts

Oct, 27

Machine-Learning-based Performance Heuristics for Runtime CPU/GPU Selection

High-level languages such as Java increase both productivity and portability with productive language features such as managed runtime, type safety, and precise exception semantics. Additionally, Java 8 provides parallel stream APIs with lambda expressions to facilitate parallel programming for mainstream users of multi-core CPUs and many-core GPUs. These high-level APIs avoid the complexity of writing […]
Oct, 18

Self-Adapting Parallel Framework for Long-Term Object Tracking

Object tracking is a crucial field in computer vision that has many uses in human-computer interaction, security and surveillance, video communication and compression, augmented reality, traffic control, etc. Many implementations are introduced in practice, and yet recent methods emphasize on tracking objects adaptively by learning the object’s perspectives and rediscovering it when it becomes untraceable, […]
Oct, 16

Sapporo2: A versatile direct N-body library

Astrophysical direct $N$-body methods have been one of the first production algorithms to be implemented using NVIDIA’s CUDA architecture. Now, almost seven years later, the GPU is the most used accelerator device in astronomy for simulating stellar systems. In this paper we present the implementation of the Sapporo2 $N$-body library, which allows researchers to use […]
Oct, 16

Density-based parallel skin lesion border detection with webCL

BACKGROUND: Dermoscopy is a highly effective and noninvasive imaging technique used in diagnosis of melanoma and other pigmented skin lesions. Many aspects of the lesion under consideration are defined in relation to the lesion border. This makes border detection one of the most important steps in dermoscopic image analysis. In current practice, dermatologists often delineate […]
Sep, 29

The Dynamical Kernel Scheduler – Part 1

Emerging processor architectures such as GPUs and Intel MICs provide a huge performance potential for high performance computing. However developing software using these hardware accelerators introduces additional challenges for the developer such as exposing additional parallelism, dealing with different hardware designs and using multiple development frameworks in order to use devices from different vendors. The […]
Sep, 26

Fast Exact Bayesian Inference for High-Dimensional Models

In this text, we present the principles that allow the tractable implementation of exact inference processes concerning a group of widespread classes of Bayesian generative models, which have until recently been deemed as intractable whenever formulated using high-dimensional joint distributions. We will demonstrate the usefulness of such a principled approach with an example of real-time […]
Sep, 17

SKMD: Single Kernel on Multiple Devices for Transparent CPU-GPU Collaboration

Heterogeneous computing on CPUs and GPUs has traditionally used fixed roles for each device: the GPU handles data parallel work by taking advantage of its massive number of cores while the CPU handles non data-parallel work, such as the sequential code or data transfer management. This work distribution can be a poor solution as it […]
Sep, 15

Efficient Convolutional Neural Networks for Pixelwise Classification on Heterogeneous Hardware Systems

This work presents and analyzes three convolutional neural network (CNN) models for efficient pixelwise classification of images. When using convolutional neural networks to classify single pixels in patches of a whole image, a lot of redundant computations are carried out when using sliding window networks. This set of new architectures solve this issue by either […]
Sep, 15

PENCIL: A Platform-Neutral Compute Intermediate Language for Accelerator Programming

Programming accelerators such as GPUs with low-level APIs and languages such as OpenCL and CUDA is difficult, error-prone, and not performance-portable. Automatic parallelization and domain specific languages (DSLs) have been proposed to hide complexity and regain performance portability. We present PENCIL, a rigorously-defined subset of GNU C99-enriched with additional language constructs-that enables compilers to exploit […]
Sep, 9

Parallel waveform extraction algorithms for the Cherenkov Telescope Array Real-Time Analysis

The Cherenkov Telescope Array (CTA) is the next generation observatory for the study of very high-energy gamma rays from about 20 GeV up to 300 TeV. Thanks to the large effective area and field of view, the CTA observatory will be characterized by an unprecedented sensitivity to transient flaring gamma-ray phenomena compared to both current […]
Sep, 8

Contributions to the Efficient Use of General Purpose Coprocessors: Kernel Density Estimation as Case Study

The high performance computing landscape is shifting from assemblies of homogeneous nodes towards heterogeneous systems, in which nodes consist of a combination of traditional out-oforder execution cores and accelerator devices. Accelerators, built around GPUs, many-core chips, or FPGAs, are used to offload compute-intensive tasks. These devices provide superior theoretical performance compared to traditional multi-core CPUs, […]
Sep, 5

Virtualizing Data Parallel Systems for Portability, Productivity, and Performance

Computer systems equipped with graphics processing units (GPUs) have become increasingly common over the last decade. In order to utilize the highly data parallel architecture of GPUs for general purpose applications, new programming models such as OpenCL and CUDA were introduced, showing that data parallel kernels on GPUs can achieve speedups by several orders of […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: