18862

Posts

May, 1

Transparent Acceleration for Heterogeneous Platforms With Compilation to OpenCL

Multi-accelerator platforms combine CPUs and different accelerator architectures within a single compute node. Such systems are capable of processing parallel workloads very efficiently while being more energy efficient than regular systems consisting of CPUs only. However, the architectures of such systems are diverse, forcing developers to port applications to each accelerator using different programming languages, […]
May, 1

The Risks of WebGL: Analysis, Evaluation and Detection

WebGL is a browser feature that enables JavaScript-based control of the graphics processing unit (GPU) to render interactive 3D and 2D graphics, without the use of plug-ins. Exploiting WebGL for attacks will affect billions of users since browsers serve as the main interaction mechanism with the world wide web. This paper explores the potential threats […]
Apr, 28

A mechanism for balancing accuracy and scope in cross-machine black-box GPU performance modeling

The ability to model, analyze, and predict execution time of computations is an important building block supporting numerous efforts, such as load balancing, performance optimization, and automated performance tuning for high performance, parallel applications. In today’s increasingly heterogeneous computing environment, this task must be accomplished efficiently across multiple architectures, including massively parallel coprocessors like GPUs. […]
Apr, 28

Chunkflow: Distributed Hybrid Cloud Processing of Large 3D Images by Convolutional Nets

It is now common to process volumetric biomedical images using 3D Convolutional Networks (ConvNets). This can be challenging for the teravoxel and even petavoxel images that are being acquired today by light or electron microscopy. Here we introduce chunkflow, a software framework for distributing ConvNet processing over local and cloud GPUs and CPUs. The image […]
Apr, 28

Wasserstein-Fisher-Rao Document Distance

As a fundamental problem of natural language processing, it is important to measure the distance between different documents. Among the existing methods, the Word Mover’s Distance (WMD) has shown remarkable success in document semantic matching for its clear physical insight as a parameter-free model. However, WMD is essentially based on the classical Wasserstein metric, thus […]
Apr, 28

GPU-based Efficient Join Algorithms on Hadoop

The growing data has brought tremendous pressure for query processing and storage, so there are many studies that focus on using GPU to accelerate join operation, which is one of the most important operations in modern database systems. However, existing GPU acceleration join operation researches are not very suitable for the join operation on big […]
Apr, 20

Loop Perforation in OpenACC

High-level programming models such as OpenMP and OpenACC are used to accelerate loop-parallelizable applications. In such applications, a very large number of loop iterations are launched as threads on the accelerator, where every iteration executes the same code sequence (loop body or kernel) but on different data. In such workloads, similarities in the input lead […]
Apr, 20

A Comparative Study of Asynchronous Many-Tasking Runtimes: Cilk, Charm++, ParalleX and AM++

We evaluate and compare four contemporary and emerging runtimes for high-performance computing(HPC) applications: Cilk, Charm++, ParalleX and AM++. We compare along three bases: programming model, execution model and the implementation on an underlying machine model. The comparison study includes a survey of each runtime system’s programming models, their corresponding execution models, their stated features, and […]
Apr, 20

On Optimizing Complex Stencils on GPUs

Stencil computations are often the computeintensive kernel in many scientific applications. With the increasing demand for computational accuracy, and the emergence of massively data-parallel high-bandwidth architectures like GPUs, stencils have steadily become more complex in terms of the stencil order, data accesses, and reuse patterns. Many prior efforts have focused on optimizing simpler stencil computations […]
Apr, 20

Real world applications of Artificial Intelligence on constrained hardware

These days the field of Artificial Intelligence (and its many subfields) is moving really fast, many new techniques are becoming available from various different subfields. However, many of these algorithms are only made to run on very powerful research workstations without considering how they can be used on real-world hardware, be it embedded hardware, powerful […]
Apr, 20

Concurrent query processing in a GPU-based database system

The unrivaled computing capabilities of modern GPUs meet the demand of processing massive amounts of data seen in many application domains. While traditional HPC systems support applications as standalone entities that occupy entire GPUs, there are GPU-based DBMSs where multiple tasks are meant to be run at the same time in the same device. To […]
Apr, 14

Accelerated Neural Networks on OpenCL Devices Using SYCL-DNN

Over the past few years machine learning has seen a renewed explosion of interest, following a number of studies showing the effectiveness of neural networks in a range of tasks which had previously been considered incredibly hard. Neural networks’ effectiveness in the fields of image recognition and natural language processing stems primarily from the vast […]

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us: