16782

Posts

Dec, 3

A Real-time Single Pulse Detection Algorithm for GPUs

The detection of non-repeating events in the radio spectrum has become an important area of study in radio astronomy over the last decade due to the discovery of fast radio bursts (FRBs). We have implemented a single pulse detection algorithm, for NVIDIA GPUs, which use boxcar filters of varying widths. Our code performs the calculation […]
Nov, 30

Hardware thread reordering to boost OpenCL throughput on FPGAs

Availability of OpenCL for FPGAs has raised new questions about the efficiency of massive thread-level parallelism on FPGAs. The general trend is toward creating deep pipelining and in-order execution of many OpenCL threads across a shared data-path. While this can be a very effective approach for regular kernels, its efficiency significantly diminishes for irregular kernels […]
Nov, 30

Parallelizing Word2Vec in Multi-Core and Many-Core Architectures

Word2vec is a widely used algorithm for extracting low-dimensional vector representations of words. State-of-the-art algorithms including those by Mikolov et al. have been parallelized for multi-core CPU architectures, but are based on vector-vector operations with "Hogwild" updates that are memory-bandwidth intensive and do not efficiently use computational resources. In this paper, we propose "HogBatch" by […]
Nov, 30

Optimization of Pattern Matching Algorithms for Multi- and Many-Core Platforms

Image and video compression play a major role in the world today, allowing the storage and transmission of large multimedia content volumes. However, the processing of this information requires high computational resources, hence the improvement of the computational performance of these compression algorithms is very important. The Multidimensional Multiscale Parser (MMP) is a pattern-matching-based compression […]
Nov, 30

Deep, Dense, and Low-Rank Gaussian Conditional Random Fields

In this work we introduce a fully-connected graph structure in the Deep Gaussian Conditional Random Field (G-CRF) model. For this we express the pairwise interactions between pixels as the inner-products of low-dimensional embeddings, delivered by a new subnetwork of a deep architecture. We efficiently minimize the resulting energy by solving the resulting low-rank linear system […]
Nov, 29

dCUDA: hardware supported overlap of computation and communication

Over the last decade, CUDA and the underlying GPU hardware architecture have continuously gained popularity in various high-performance computing application domains such as climate modeling, computational chemistry, or machine learning. Despite this popularity, we lack a single coherent programming model for GPU clusters. We therefore introduce the dCUDA programming model, which implements device-side remote memory […]
Nov, 27

2nd International Conference on Robotics and Automation Engineering (ICRAE), 2017

The ICRAE conference is an international forum for the presentation of technological advances and research results in the fields of Robotics and Automation Engineering.Researchers from across the world are welcome to attend and submit their best work to ICRAE 2017 conference to exchange ideas about the latest theories, technology, data, and videos furthering the state-of-the-art […]
Nov, 27

2nd International Conference on Computational Intelligence and Applications (ICCIA), 2017

The aim objective of ICCIA 2017 is to present the latest research and results of scientists related to Computational Intelligence and Applications topics. This conference provides opportunities for the different areas delegates to exchange new ideas and application experiences face to face, to establish business or research relations and to find global partners for future […]
Nov, 25

A Metric for Performance Portability

The term "performance portability" has been informally used in computing to refer to a variety of notions which generally include: 1) the ability to run one application across multiple hardware platforms; and 2) achieving some notional level of performance on these platforms. However, there has been a noticeable lack of consensus on the precise meaning […]
Nov, 25

Fast and Energy-Efficient CNN Inference on IoT Devices

Convolutional Neural Networks (CNNs) exhibit remarkable performance in various machine learning tasks. As sensor-equipped internet of things (IoT) devices permeate into every aspect of modern life, it is increasingly important to run CNN inference, a computationally intensive application, on resource constrained devices. We present a technique for fast and energy-efficient CNN inference on mobile SoC […]
Nov, 25

PVR: Patch-to-Volume Reconstruction for Large Area Motion Correction of Fetal MRI

In this paper we present a novel method for the correction of motion artifacts that are present in fetal Magnetic Resonance Imaging (MRI) scans of the whole uterus. Contrary to current slice-to-volume registration (SVR) methods, requiring an inflexible anatomical enclosure of a single investigated organ, the proposed patch-to-volume reconstruction (PVR) approach is able to reconstruct […]
Nov, 25

Efficient Kernel Synthesis for Performance Portable Programming

The diversity of microarchitecture designs in heterogeneous computing systems allows programs to achieve high performance and energy efficiency, but results in substantial software re-development cost for each type or generation of hardware. To mitigate this cost, a performance portable programming system is required. One fundamental difference between architectures that makes performance portability challenging is the […]
Page 11 of 909« First...910111213...203040...Last »

Recent source codes

* * *

* * *

TwitterAPIExchange Object
(
    [oauth_access_token:TwitterAPIExchange:private] => 301967669-yDz6MrfyJFFsH1DVvrw5Xb9phx2d0DSOFuLehBGh
    [oauth_access_token_secret:TwitterAPIExchange:private] => o29ji3VLVmB6jASMqY8G7QZDCrdFmoTvCDNNUlb7s
    [consumer_key:TwitterAPIExchange:private] => TdQb63pho0ak9VevwMWpEgXAE
    [consumer_secret:TwitterAPIExchange:private] => Uq4rWz7nUnH1y6ab6uQ9xMk0KLcDrmckneEMdlq6G5E0jlQCFx
    [postfields:TwitterAPIExchange:private] => 
    [getfield:TwitterAPIExchange:private] => ?cursor=-1&screen_name=hgpu&skip_status=true&include_user_entities=false
    [oauth:protected] => Array
        (
            [oauth_consumer_key] => TdQb63pho0ak9VevwMWpEgXAE
            [oauth_nonce] => 1487819405
            [oauth_signature_method] => HMAC-SHA1
            [oauth_token] => 301967669-yDz6MrfyJFFsH1DVvrw5Xb9phx2d0DSOFuLehBGh
            [oauth_timestamp] => 1487819405
            [oauth_version] => 1.0
            [cursor] => -1
            [screen_name] => hgpu
            [skip_status] => true
            [include_user_entities] => false
            [oauth_signature] => mI+/CMz9FpkkjFGOgsX95diqEoE=
        )

    [url] => https://api.twitter.com/1.1/users/show.json
)
Follow us on Facebook
Follow us on Twitter

HGPU group

2173 peoples are following HGPU @twitter

HGPU group © 2010-2017 hgpu.org

All rights belong to the respective authors

Contact us: