14585

Posts

Sep, 17

Efficient Kernel Fusion Techniques for Massive Video Data Analysis on GPGPUs

Kernels are executable code segments and kernel fusion is a technique for combing the segments in a coherent manner to improve execution time. For the first time, we have developed a technique to fuse image processing kernels to be executed on GPGPUs for improving execution time and total throughput (amount of data processed in unit […]
Sep, 17

SKMD: Single Kernel on Multiple Devices for Transparent CPU-GPU Collaboration

Heterogeneous computing on CPUs and GPUs has traditionally used fixed roles for each device: the GPU handles data parallel work by taking advantage of its massive number of cores while the CPU handles non data-parallel work, such as the sequential code or data transfer management. This work distribution can be a poor solution as it […]
Sep, 17

CLTune: A Generic Auto-Tuner for OpenCL Kernels

This work presents CLTune, an auto-tuner for OpenCL kernels. It evaluates and tunes kernel performance of a generic, user-defined search space of possible parametervalue combinations. Example parameters include the OpenCL workgroup size, vector data-types, tile sizes, and loop unrolling factors. CLTune can be used in the following scenarios: 1) when there are too many tunable […]
Sep, 17

gSLICr: SLIC superpixels at over 250Hz

We introduce a parallel GPU implementation of the Simple Linear Iterative Clustering (SLIC) superpixel segmentation. Using a single graphic card, our implementation achieves speedups of up to 83x from the standard sequential implementation. Our implementation is fully compatible with the standard sequential implementation and the software is now available online and is open source.
Sep, 17

Scalable Metropolis Monte Carlo for simulation of hard shapes

We design and implement HPMC, a scalable hard particle Monte Carlo simulation toolkit, and release it open source as part of HOOMD-blue. HPMC runs in parallel on many CPUs and many GPUs using domain decomposition. We employ BVH trees instead of cell lists on the CPU for fast performance, especially with large particle size disparity, […]
Sep, 15

Refinements in Syntactic Parsing

Syntactic parsing is one of the core tasks of natural language processing, with many appli- cations in downstream NLP tasks, from machine translation and summarization to relation extraction and coreference resolution. Parsing performance on English texts, particularly well-edited newswire text, is generally regarded as quite good. However, state-of-the-art constituency parsers produce incorrect parses for more […]
Sep, 15

PENCIL: A Platform-Neutral Compute Intermediate Language for Accelerator Programming

Programming accelerators such as GPUs with low-level APIs and languages such as OpenCL and CUDA is difficult, error-prone, and not performance-portable. Automatic parallelization and domain specific languages (DSLs) have been proposed to hide complexity and regain performance portability. We present PENCIL, a rigorously-defined subset of GNU C99-enriched with additional language constructs-that enables compilers to exploit […]
Sep, 15

Efficient Convolutional Neural Networks for Pixelwise Classification on Heterogeneous Hardware Systems

This work presents and analyzes three convolutional neural network (CNN) models for efficient pixelwise classification of images. When using convolutional neural networks to classify single pixels in patches of a whole image, a lot of redundant computations are carried out when using sliding window networks. This set of new architectures solve this issue by either […]
Sep, 15

linalg: Matrix Computations in Apache Spark

We describe matrix computations available in the cluster programming framework, Apache Spark. Out of the box, Spark comes with the mllib.linalg library, which provides abstractions and implementations for distributed matrices. Using these abstractions, we highlight the computations that were more challenging to distribute. When translating single-node algorithms to run on a distributed cluster, we observe […]
Sep, 15

A GPU-based Parallel Ant Colony Algorithm for Scientific Workflow Scheduling

Scientific workflow scheduling problem is a combinatorial optimization problem. In the real application, the scientific workflow generally has thousands of task nodes. Scheduling large-scale workflow has huge computational overhead. In this paper, a parallel algorithm for scientific workflow scheduling is proposed so that the computing speed can be improved greatly. Our method used ant colony […]
Sep, 10

5th International Conference on Industrial Technology and Management (ICITM), 2016

Topics: Decision Analysis and Methods E-Business and E-Commerce Engineering Economy and Cost Analysis Engineering Education and Training Facilities Planning and Management Global Manufacturing and Management Human Factors Information Processing and Engineering Intelligent Systems Manufacturing Systems Operations Research Production Planning and Control Project Management Quality Control and Management Reliability and Maintenance Engineering Safety, Security and Risk […]
Sep, 10

5th International Conference on Educational and Information Technology (ICEIT), 2016

Topics: Database Technology Artificial Intelligence Computer architecture Software Engineering Computer Graphics Computer Application Control Technology Systems Engineering Service learning Learning models Faculty development Distance Education for Computers Life-long education Computer Education for Particular Group Other Computer Education Active learning Computer Education for Graduates Computer Education for Undergraduates Network Technology Communication Technology Other Advanced Technology Undergraduate […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: