16763

Posts

Nov, 25

dMath: Distributed Linear Algebra for DL

The paper presents a parallel math library, dMath, that demonstrates leading scaling when using intranode, internode, and hybrid-parallelism for deep learning (DL). dMath provides easy-to-use distributed primitives and a variety of domain-specific algorithms including matrix multiplication, convolutions, and others allowing for rapid development of scalable applications like deep neural networks (DNNs). Persistent data stored in […]
Nov, 23

Performance Analysis of CUDA and OpenCL By Implementation of Cryptographic Algorithms

This paper presents a Performance Analysis of CUDA and OpenCL. Three different cryptographic algorithms, i.e. DES, MD5, and SHA-1 have been selected as the benchmarks for extensive analysis of the performance gaps between the two. Our results show that, on the average scenario, CUDA performs 27% better than OpenCL while in the best case scenario […]
Nov, 23

A Metaprogramming and Autotuning Framework for Deploying Deep Learning Applications

In recent years, deep neural networks (DNNs), have yielded strong results on a wide range of applications. Graphics Processing Units (GPUs) have been one key enabling factor leading to the current popularity of DNNs. However, despite increasing hardware flexibility and software programming toolchain maturity, high efficiency GPU programming remains difficult: it suffers from high complexity, […]
Nov, 23

Deep Tensor Convolution on Multicores

Deep convolutional neural networks (ConvNets) have become a de facto standard for image classification and segmentation problems. These networks have also had early success in the video domain, despite failing to capture motion continuity and other rich temporal correlations. Evidence has since emerged that extending ConvNets to 3-dimensions leads to state-of-the-art performance across a broad […]
Nov, 23

GA3C: GPU-based A3C for Deep Reinforcement Learning

We introduce and analyze the computational aspects of a hybrid CPU/GPU implementation of the Asynchronous Advantage Actor-Critic (A3C) algorithm, currently the state-of-the-art method in reinforcement learning for various gaming tasks. Our analysis concentrates on the critical aspects to leverage the GPU’s computational power, including the introduction of a system of queues and a dynamic scheduling […]
Nov, 23

Optimization and Evaluation of VLPL-S Particle-in-cell Code on Knights Landing

VLPL-S code is developed based on the particlein-cell (PIC) algorithm, which is the mainstream algorithm of plasma behavior research. In this paper, we report our early experience on porting and optimizing the VLPL-S particle-in-cell code on the Knights Landing. By applying general optimization methods such as memory access optimization, thread level parallelism and vectorization to […]
Nov, 22

High performance pattern matching and data remanence on graphics processing units

Pattern matching is an important task in a plethora of different fields ranging from computer science to medical application, but is also a resource consuming problem.With the increase in network link speed, and the tremendous amounts of data generated, serial pattern matching on Central Processing Unit (CPU) is close to being rendered obsolete. The ubiquitous […]
Nov, 22

Processing OLTP Workloads on Hybrid CPU/GPU Systems

In recent times there have been a plethora of researches done on the utilization of co-processors like GPU and FPGA in database management system (DBMS). The reason for this trend is that modern processors have reached a performance threshold. Two major factors that have led to this behaviour are Memory Wall and Power Wall. This […]
Nov, 22

SC-DCNN: Highly-Scalable Deep Convolutional Neural Network using Stochastic Computing

With recent advancing of Internet of Things (IoTs), it becomes very attractive to implement the deep convolutional neural networks (DCNNs) onto embedded/portable systems. Presently, executing the software-based DCNNs requires high-performance server clusters in practice, restricting their widespread deployment on the mobile devices. To overcome this issue, considerable research efforts have been conducted in the context […]
Nov, 22

GPU-accelerated Red Blood Cells Simulations with Transport Dissipative Particle Dynamics

Mesoscopic numerical simulations provide a unique approach for the quantification of the chemical influences on red blood cell functionalities. The transport Dissipative Particles Dynamics (tDPD) method can lead to such effective multiscale simulations due to its ability to simultaneously capture mesoscopic advection, diffusion, and reaction. In this paper, we present a GPU-accelerated red blood cell […]
Nov, 22

Celeris: A GPU-accelerated open source software with a Boussinesq-type wave solver for real-time, interactive simulation and visualization

In this paper, we introduce an interactive coastal wave simulation and visualization software, called Celeris. Celeris is an open source software which needs minimum preparation to run on a Windows machine. The software solves the extended Boussinesq equations using a hybrid finite volume – finite difference method and supports moving shoreline boundaries. The simulation and […]
Nov, 20

2nd International Workshop on Theoretical Approaches to Performance Evaluation, Modeling and Simulation (TAPEMS), 2017

Performance and an aspect of it, energy efficiency, has become a key issue in both high performance and embedded computing. The objective of the 2nd TAPEMS International Workshop on Theoretical Approaches to Performance Evaluation, Modeling and Simulation is to bring together researchers and practitioners from academia and industry to discuss current advances and trends in theoretical […]
Page 12 of 909« First...1011121314...203040...Last »

Recent source codes

* * *

* * *

TwitterAPIExchange Object
(
    [oauth_access_token:TwitterAPIExchange:private] => 301967669-yDz6MrfyJFFsH1DVvrw5Xb9phx2d0DSOFuLehBGh
    [oauth_access_token_secret:TwitterAPIExchange:private] => o29ji3VLVmB6jASMqY8G7QZDCrdFmoTvCDNNUlb7s
    [consumer_key:TwitterAPIExchange:private] => TdQb63pho0ak9VevwMWpEgXAE
    [consumer_secret:TwitterAPIExchange:private] => Uq4rWz7nUnH1y6ab6uQ9xMk0KLcDrmckneEMdlq6G5E0jlQCFx
    [postfields:TwitterAPIExchange:private] => 
    [getfield:TwitterAPIExchange:private] => ?cursor=-1&screen_name=hgpu&skip_status=true&include_user_entities=false
    [oauth:protected] => Array
        (
            [oauth_consumer_key] => TdQb63pho0ak9VevwMWpEgXAE
            [oauth_nonce] => 1487825435
            [oauth_signature_method] => HMAC-SHA1
            [oauth_token] => 301967669-yDz6MrfyJFFsH1DVvrw5Xb9phx2d0DSOFuLehBGh
            [oauth_timestamp] => 1487825435
            [oauth_version] => 1.0
            [cursor] => -1
            [screen_name] => hgpu
            [skip_status] => true
            [include_user_entities] => false
            [oauth_signature] => 3LVRfrqdnXX1k6aqe1k1grMXHu0=
        )

    [url] => https://api.twitter.com/1.1/users/show.json
)
Follow us on Facebook
Follow us on Twitter

HGPU group

2173 peoples are following HGPU @twitter

HGPU group © 2010-2017 hgpu.org

All rights belong to the respective authors

Contact us: