Sep, 27

Efficient and portable acceleration of quantum chemical many-body methods in mixed floating point precision using OpenACC compiler directives

It is demonstrated how the non-proprietary OpenACC standard of compiler directives may be used to compactly and efficiently accelerate the rate-determining steps of two of the most routinely applied many-body methods of electronic structure theory, namely the second-order M{o}ller-Plesset (MP2) model in its resolution-of-the-identity (RI) approximated form and the (T) triples correction to the coupled […]
Sep, 27

FastCollect: Offloading Generational Garbage Collection to Integrated GPUs

Generational Mark-Sweep Garbage Collection is a widely used garbage collection technique. However, the garbage collector has poor execution efficiency for large programs. Aggressive collection causes execution pauses in the program, while reducing the collection frequency leads to memory wastage. In this work, we develop FastCollect, a parallel version of the generational mark-sweep garbage collector running […]
Sep, 27

Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations

We introduce a method to train Quantized Neural Networks (QNNs) — neural networks with extremely low precision (e.g., 1-bit) weights and activations, at run-time. At train-time the quantized weights and activations are used for computing the parameter gradients. During the forward pass, QNNs drastically reduce memory size and accesses, and replace most arithmetic operations with […]
Sep, 27

Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation

Neural Machine Translation (NMT) is an end-to-end learning approach for automated translation, with the potential to overcome many of the weaknesses of conventional phrase-based translation systems. Unfortunately, NMT systems are known to be computationally expensive both in training and in translation inference. Also, most NMT systems have difficulty with rare words. These issues have hindered […]
Sep, 27

A Novel GPU-based Parallel Implementation Scheme and Performance Analysis of Robot Forward Dynamics Algorithms

We propose a novel unifying scheme for parallel implementation of articulated robot dynamics algorithms. It is based on a unified Lie group notation for deriving the equations of motion of articulated robots, where various well-known forward algorithms differ only by their joint inertia matrix inversion strategies. This new scheme leads to a unified abstraction of […]
Sep, 27

Solving Batched Linear Programs on GPU and Multicore CPU

Linear Programs (LPs) appear in a large number of applications and offloading them to the GPU is viable to gain performance. Existing work on offloading and solving an LP on GPU suggests that performance is gained from large sized LPs (typically 500 constraints, 500 variables and above). In order to gain performance from GPU for […]
Sep, 22

Bridging the Semantic Gaps of GPU Acceleration for Scaleout CNN-based Big Data Processing: Think Big, See Small

Convolutional Neural Networks (CNNs) have substantially advanced the state-of-the-art accuracies of object recognition, which is the core function of a myriad of modern multimedia processing techniques such as image/video processing, speech recognition, and natural language processing. GPU-based accelerators gained increasing attention because a large amount of highly parallel neurons in CNN naturally matches the GPU […]
Sep, 22

Tuning Stencil Codes in OpenCL for FPGAs

OpenCL is designed as a parallel programming framework to support heterogeneous computing platforms. The implicit or explicit parallelism in OpenCL kernel code enables efficient FPGA implementation from a high-level programming abstraction. However, FPGA architecture is completely different from GPU architecture, for which OpenCL is widely used. Tuning OpenCL codes to achieve high performance on FPGAs […]
Sep, 22

Characterization of Speech Recognition Systems on GPU Architectures

Automatic speech recognition is one of the most important applications in the area of cognitive computing. Mobile devices, such as smartphones, have incorporated speech recognition as one of the main interfaces for user interaction. This trend towards voice-based user interfaces is likely to continue in the next years. Effective speech recognition systems require real-time recognition, […]
Sep, 22

Efficient dictionary learning implementation on the GPU using OpenCL

The dictionary learning field offers a wide range of algorithms that are able to provide good sparse approximations and well trained dictionaries. These algorithms are very complex and this is reflected in the slow execution of their computationally intensive implementations. This article proposes efficient parallel implementations for the main algorithms in the field that significantly […]
Sep, 22

MCS 572: Introduction to Supercomputing

The goal of the course is to study parallel algorithms and their implementation on distributed and shared memory computers, using message passing, OpenMP, and threads. In the second half of the course we will consider general purpose graphics processing units. Prerequisites are a working knowledge of C (or willingness to acquire programming skills) and a […]
Sep, 20

Parallel Computational Fluid Dynamics With the Intel Xeon Phi Coprocessor

The Intel Xeon Phi coprocessor is a PCI Express form factor card designed to work in tangent with Intel Xeon processors in order to allow faster execution of highly parallelizable code. Efficient execution of highly parallel applications is achieved through the use of many smaller, lower clock speed cores; allowing for many more simultaneous execution […]
Page 1 of 88912345...102030...Last »

* * *

* * *

TwitterAPIExchange Object
    [oauth_access_token:TwitterAPIExchange:private] => 301967669-yDz6MrfyJFFsH1DVvrw5Xb9phx2d0DSOFuLehBGh
    [oauth_access_token_secret:TwitterAPIExchange:private] => o29ji3VLVmB6jASMqY8G7QZDCrdFmoTvCDNNUlb7s
    [consumer_key:TwitterAPIExchange:private] => TdQb63pho0ak9VevwMWpEgXAE
    [consumer_secret:TwitterAPIExchange:private] => Uq4rWz7nUnH1y6ab6uQ9xMk0KLcDrmckneEMdlq6G5E0jlQCFx
    [postfields:TwitterAPIExchange:private] => 
    [getfield:TwitterAPIExchange:private] => ?cursor=-1&screen_name=hgpu&skip_status=true&include_user_entities=false
    [oauth:protected] => Array
            [oauth_consumer_key] => TdQb63pho0ak9VevwMWpEgXAE
            [oauth_nonce] => 1475064082
            [oauth_signature_method] => HMAC-SHA1
            [oauth_token] => 301967669-yDz6MrfyJFFsH1DVvrw5Xb9phx2d0DSOFuLehBGh
            [oauth_timestamp] => 1475064082
            [oauth_version] => 1.0
            [cursor] => -1
            [screen_name] => hgpu
            [skip_status] => true
            [include_user_entities] => false
            [oauth_signature] => YlItK3si1YWgYO66mN9nNEcibSE=

    [url] => https://api.twitter.com/1.1/users/show.json
Follow us on Facebook
Follow us on Twitter

HGPU group

2000 peoples are following HGPU @twitter

HGPU group © 2010-2016 hgpu.org

All rights belong to the respective authors

Contact us: