16898

Posts

Jan, 10

Software Prefetching for Indirect Memory Accesses

Many modern data processing and HPC workloads are heavily memory-latency bound. A tempting proposition to solve this is software prefetching, where special non-blocking loads are used to bring data into the cache hierarchy just before being required. However, these are difficult to insert to effectively improve performance, and techniques for automatic insertion are currently limited. […]
Jan, 10

DeepDSL: A Compilation-based Domain-Specific Language for Deep Learning

In recent years, Deep Learning (DL) has found great success in domains such as multimedia understanding. However, the complex nature of multimedia data makes it difficult to develop DL-based software. The state-of-the art tools, such as Caffe, TensorFlow, Torch7, and CNTK, while are successful in their applicable domains, are programming libraries with fixed user interface, […]
Jan, 10

An FPGA Accelerator for Molecular Dynamics Simulation Using OpenCL

Molecular dynamics (MD) simulations are very important to study physical properties of the atoms and molecules. However, a huge amount of processing time is required to simulate a few nano-seconds of an actual experiment. Although the hardware acceleration using FPGAs provides promising results, huge design time and hardware design skills are required to implement an […]
Jan, 10

GPU SQL Query Accelerator

The world rapidly grows with every connected sensors and devices with geo-location capabilities to update its location. Data analytic industries are finding ways to store the data, and also turn this raw data into valuable information as an eminent business intelligence services. It has inadvertently conformed a flood of granular data about our world. Crucially, […]
Jan, 8

Synchronization and Coordination in Heterogeneous Processors

Recent developments in internet connectivity and mobile devices have spurred massive data growth. Users demand rapid data processing from both large-scale systems and energy-constrained personal devices. Concurrently with this data growth, transistor scaling trends have slowed, diminishing processor performance and energy improvements compared to prior generations. To sustain performance trends while staying within energy budgets, […]
Jan, 8

A Framework for Dense Triangular Matrix Kernels on Various Manycore Architectures

We present a new high performance framework for dense triangular BLAS kernels, i.e., triangular matrix-matrix multiplication (TRMM) and triangular solve (TRSM), on various manycore architectures. This is an extension of a previous work on a single GPU by the same authors (Charara et al., EuroPar, 2016). In this paper, the performance of triangular BLAS kernels […]
Jan, 8

Akid: A Library for Neural Network Research and Production from a Dataism Approach

Neural networks are a revolutionary but immature technique that is fast evolving and heavily relies on data. To benefit from the newest development and newly available data, we want the gap between research and production as small as possibly. On the other hand, differing from traditional machine learning models, neural network is not just yet […]
Jan, 8

Communication and Coordination Paradigms for Highly-Parallel Accelerators

As CPU performance plateaus, many communities are turning to highly-parallel accelerators such as graphics processing units (GPUs) to obtain their desired level of processing power. Unfortunately, the GPU’s massive parallelism and data-parallel execution model make it difficult to synchronize GPU threads. To resolve this, we introduce aggregation buffers, which are producer/consumer queues that act as […]
Jan, 8

Gunrock: GPU Graph Analytics

For large-scale graph analytics on the GPU, the irregularity of data access and control flow, and the complexity of programming GPUs, have presented two significant challenges to developing a programmable high-performance graph library. "Gunrock", our graph-processing system designed specifically for the GPU, uses a high-level, bulk-synchronous, data-centric abstraction focused on operations on a vertex or […]
Jan, 4

Deep Neural Networks to Enable Real-time Multimessenger Astrophysics

We introduce a new methodology for time-domain signal processing, based on deep learning neural networks, which has the potential to revolutionize data analysis in science. To illustrate how this enables real-time multimessenger astrophysics, we designed two deep convolutional neural networks that can analyze time-series data from observatories including advanced LIGO. The first neural network recognizes […]
Jan, 4

Massively Parallel Computation of Accurate Densities for N-body Dark Matter Simulations using the Phase-Space-Element Method

In 2012 a method to analyze N-body dark matter simulations using a tetrahedral tesselation of the three-dimensional dark matter manifold in six-dimensional phase space was introduced. This paper presents an accurate density computation approach for large N-body datasets, that is based on this technique and designed for massively parallel GPU-clusters. The densities are obtained by […]
Jan, 4

Design and optimization of a portable LQCD Monte Carlo code using OpenACC

The present panorama of HPC architectures is extremely heterogeneous, ranging from traditional multi-core CPU processors, supporting a wide class of applications but delivering moderate computing performance, to many-core GPUs, exploiting aggressive data-parallelism and delivering higher performances for streaming computing applications. In this scenario, code portability (and performance portability) become necessary for easy maintainability of applications; […]
Page 2 of 90512345...102030...Last »

* * *

* * *

TwitterAPIExchange Object
(
    [oauth_access_token:TwitterAPIExchange:private] => 301967669-yDz6MrfyJFFsH1DVvrw5Xb9phx2d0DSOFuLehBGh
    [oauth_access_token_secret:TwitterAPIExchange:private] => o29ji3VLVmB6jASMqY8G7QZDCrdFmoTvCDNNUlb7s
    [consumer_key:TwitterAPIExchange:private] => TdQb63pho0ak9VevwMWpEgXAE
    [consumer_secret:TwitterAPIExchange:private] => Uq4rWz7nUnH1y6ab6uQ9xMk0KLcDrmckneEMdlq6G5E0jlQCFx
    [postfields:TwitterAPIExchange:private] => 
    [getfield:TwitterAPIExchange:private] => ?cursor=-1&screen_name=hgpu&skip_status=true&include_user_entities=false
    [oauth:protected] => Array
        (
            [oauth_consumer_key] => TdQb63pho0ak9VevwMWpEgXAE
            [oauth_nonce] => 1485094722
            [oauth_signature_method] => HMAC-SHA1
            [oauth_token] => 301967669-yDz6MrfyJFFsH1DVvrw5Xb9phx2d0DSOFuLehBGh
            [oauth_timestamp] => 1485094722
            [oauth_version] => 1.0
            [cursor] => -1
            [screen_name] => hgpu
            [skip_status] => true
            [include_user_entities] => false
            [oauth_signature] => v0/OzwTf2WSr9wSNXrfNjaskskA=
        )

    [url] => https://api.twitter.com/1.1/users/show.json
)
Follow us on Facebook
Follow us on Twitter

HGPU group

2138 peoples are following HGPU @twitter

HGPU group © 2010-2017 hgpu.org

All rights belong to the respective authors

Contact us: