16762

Posts

Nov, 27

2nd International Conference on Robotics and Automation Engineering (ICRAE), 2017

The ICRAE conference is an international forum for the presentation of technological advances and research results in the fields of Robotics and Automation Engineering.Researchers from across the world are welcome to attend and submit their best work to ICRAE 2017 conference to exchange ideas about the latest theories, technology, data, and videos furthering the state-of-the-art […]
Nov, 27

2nd International Conference on Computational Intelligence and Applications (ICCIA), 2017

The aim objective of ICCIA 2017 is to present the latest research and results of scientists related to Computational Intelligence and Applications topics. This conference provides opportunities for the different areas delegates to exchange new ideas and application experiences face to face, to establish business or research relations and to find global partners for future […]
Nov, 25

A Metric for Performance Portability

The term "performance portability" has been informally used in computing to refer to a variety of notions which generally include: 1) the ability to run one application across multiple hardware platforms; and 2) achieving some notional level of performance on these platforms. However, there has been a noticeable lack of consensus on the precise meaning […]
Nov, 25

Fast and Energy-Efficient CNN Inference on IoT Devices

Convolutional Neural Networks (CNNs) exhibit remarkable performance in various machine learning tasks. As sensor-equipped internet of things (IoT) devices permeate into every aspect of modern life, it is increasingly important to run CNN inference, a computationally intensive application, on resource constrained devices. We present a technique for fast and energy-efficient CNN inference on mobile SoC […]
Nov, 25

PVR: Patch-to-Volume Reconstruction for Large Area Motion Correction of Fetal MRI

In this paper we present a novel method for the correction of motion artifacts that are present in fetal Magnetic Resonance Imaging (MRI) scans of the whole uterus. Contrary to current slice-to-volume registration (SVR) methods, requiring an inflexible anatomical enclosure of a single investigated organ, the proposed patch-to-volume reconstruction (PVR) approach is able to reconstruct […]
Nov, 25

Efficient Kernel Synthesis for Performance Portable Programming

The diversity of microarchitecture designs in heterogeneous computing systems allows programs to achieve high performance and energy efficiency, but results in substantial software re-development cost for each type or generation of hardware. To mitigate this cost, a performance portable programming system is required. One fundamental difference between architectures that makes performance portability challenging is the […]
Nov, 25

dMath: Distributed Linear Algebra for DL

The paper presents a parallel math library, dMath, that demonstrates leading scaling when using intranode, internode, and hybrid-parallelism for deep learning (DL). dMath provides easy-to-use distributed primitives and a variety of domain-specific algorithms including matrix multiplication, convolutions, and others allowing for rapid development of scalable applications like deep neural networks (DNNs). Persistent data stored in […]
Nov, 23

Performance Analysis of CUDA and OpenCL By Implementation of Cryptographic Algorithms

This paper presents a Performance Analysis of CUDA and OpenCL. Three different cryptographic algorithms, i.e. DES, MD5, and SHA-1 have been selected as the benchmarks for extensive analysis of the performance gaps between the two. Our results show that, on the average scenario, CUDA performs 27% better than OpenCL while in the best case scenario […]
Nov, 23

A Metaprogramming and Autotuning Framework for Deploying Deep Learning Applications

In recent years, deep neural networks (DNNs), have yielded strong results on a wide range of applications. Graphics Processing Units (GPUs) have been one key enabling factor leading to the current popularity of DNNs. However, despite increasing hardware flexibility and software programming toolchain maturity, high efficiency GPU programming remains difficult: it suffers from high complexity, […]
Nov, 23

Optimization and Evaluation of VLPL-S Particle-in-cell Code on Knights Landing

VLPL-S code is developed based on the particlein-cell (PIC) algorithm, which is the mainstream algorithm of plasma behavior research. In this paper, we report our early experience on porting and optimizing the VLPL-S particle-in-cell code on the Knights Landing. By applying general optimization methods such as memory access optimization, thread level parallelism and vectorization to […]
Nov, 23

Deep Tensor Convolution on Multicores

Deep convolutional neural networks (ConvNets) have become a de facto standard for image classification and segmentation problems. These networks have also had early success in the video domain, despite failing to capture motion continuity and other rich temporal correlations. Evidence has since emerged that extending ConvNets to 3-dimensions leads to state-of-the-art performance across a broad […]
Nov, 23

GA3C: GPU-based A3C for Deep Reinforcement Learning

We introduce and analyze the computational aspects of a hybrid CPU/GPU implementation of the Asynchronous Advantage Actor-Critic (A3C) algorithm, currently the state-of-the-art method in reinforcement learning for various gaming tasks. Our analysis concentrates on the critical aspects to leverage the GPU’s computational power, including the introduction of a system of queues and a dynamic scheduling […]
Page 7 of 904« First...56789...203040...Last »

* * *

* * *

TwitterAPIExchange Object
(
    [oauth_access_token:TwitterAPIExchange:private] => 301967669-yDz6MrfyJFFsH1DVvrw5Xb9phx2d0DSOFuLehBGh
    [oauth_access_token_secret:TwitterAPIExchange:private] => o29ji3VLVmB6jASMqY8G7QZDCrdFmoTvCDNNUlb7s
    [consumer_key:TwitterAPIExchange:private] => TdQb63pho0ak9VevwMWpEgXAE
    [consumer_secret:TwitterAPIExchange:private] => Uq4rWz7nUnH1y6ab6uQ9xMk0KLcDrmckneEMdlq6G5E0jlQCFx
    [postfields:TwitterAPIExchange:private] => 
    [getfield:TwitterAPIExchange:private] => ?cursor=-1&screen_name=hgpu&skip_status=true&include_user_entities=false
    [oauth:protected] => Array
        (
            [oauth_consumer_key] => TdQb63pho0ak9VevwMWpEgXAE
            [oauth_nonce] => 1484853853
            [oauth_signature_method] => HMAC-SHA1
            [oauth_token] => 301967669-yDz6MrfyJFFsH1DVvrw5Xb9phx2d0DSOFuLehBGh
            [oauth_timestamp] => 1484853853
            [oauth_version] => 1.0
            [cursor] => -1
            [screen_name] => hgpu
            [skip_status] => true
            [include_user_entities] => false
            [oauth_signature] => 0VS5Wrh5SWc50F0wnEFG8F3fVEA=
        )

    [url] => https://api.twitter.com/1.1/users/show.json
)
Follow us on Facebook
Follow us on Twitter

HGPU group

2134 peoples are following HGPU @twitter

HGPU group © 2010-2017 hgpu.org

All rights belong to the respective authors

Contact us: