Sep, 13

OpenMP as a High-Level Specification Language for Parallelism And its use in Evaluating Parallel Programming Systems

While OpenMP is the de facto standard of shared memory parallel programming models, a number of alternative programming models and runtime systems have arisen in recent years. Fairly evaluating these programming systems can be challenging and can require significant manual effort on the part of researchers. However, it is important to facilitate these comparisons as […]
Sep, 13

An efficient numerical method for solving the Boltzmann equation in multidimensions

In this paper we deal with the extension of the Fast Kinetic Scheme (FKS) [J. Comput. Phys., Vol. 255, 2013, pp 680-698] originally constructed for solving the BGK equation, to the more challenging case of the Boltzmann equation. The scheme combines a robust and fast method for treating the transport part based on an innovative […]
Sep, 13

Scheduling of Linear Algebra Kernels on Multiple Heterogeneous Resources

In this paper, we consider task-based dense linear algebra applications on a single heterogeneous node which contains regular CPU cores and a set of GPU devices. Efficient scheduling strategies are crucial in this context in order to achieve good and portable performance. HeteroPrio, a resource-centric dynamic scheduling strategy has been introduced in a previous work […]
Sep, 13

A New Architecture for Optimization Modeling Frameworks

We propose a new architecture for optimization modeling frameworks in which solvers are expressed as computation graphs in a framework like TensorFlow rather than as standalone programs built on a low-level linear algebra interface. Our new architecture makes it easy for modeling frameworks to support high performance computational platforms like GPUs and distributed clusters, as […]
Sep, 10

An Implementation of Real-Time Phased Array Radar Fundamental Functions on a DSP-Focused, High-Performance, Embedded Computing Platform

This paper investigates the feasibility of a backend design for real-time, multiple-channel processing digital phased array system, particularly for high-performance embedded computing platforms constructed of general purpose digital signal processors. First, we obtained the lab-scale backend performance benchmark from simulating beamforming, pulse compression, and Doppler filtering based on a Micro Telecom Computing Architecture (MTCA) chassis […]
Sep, 10

GPU-ArraySort: A parallel, in-place algorithm for sorting large number of arrays

Modern day analytics deals with big datasets from diverse fields. For many application the data is in the form of an array which consists of large number of smaller arrays. Existing techniques focus on sorting a single large array and cannot be used for sorting large number of smaller arrays in an efficient manner. Currently […]
Sep, 10

System Design Principles for Heterogeneous Resource Management and Scheduling in Accelerator-Based Systems

Accelerator-based systems are making rapid inroads into becoming platforms of choice for both high end cloud services and processing irregular applications like real-world graph analytics due to their high scalability and low dollar to FLOPS ratios. Yet GPUs are not first class schedulable entities causing substantial hardware resource underutilization, including their computational and data movement […]
Sep, 10

3D Object Recognition with Convolutional Neural Networks

In this work, we propose the implementation of a 3D object recognition system using Convolutional Neural Networks. For that purpose, we first analyzed the theoretical foundations of that kind of neural networks. Next, we discussed ways of representing 3D data in a compact and structured manner to feed the neural network. Those representations consist of […]
Sep, 10

OpenSBLI: A framework for the automated derivation and parallel execution of finite difference solvers on a range of computer architectures

Exascale computing will feature novel and potentially disruptive hardware architectures. Exploiting these to their full potential is non-trivial. Numerical modelling frameworks involving finite difference methods are currently limited by the ‘static’ nature of the hand-coded discretisation schemes and repeatedly may have to be re-written to run efficiently on new hardware. In contrast, OpenSBLI uses code […]
Sep, 8

Caffeine: Towards Uniformed Representation and Acceleration for Deep Convolutional Neural Networks

With the recent advancement of multilayer convolutional neural networks (CNN), deep learning has achieved amazing success in many areas, especially in visual content understanding and classification. To improve the performance and energy-efficiency of the computation-demanding CNN, the FPGA-based acceleration emerges as one of the most attractive alternatives. In this paper we design and implement Caffeine, […]
Sep, 8

A Lightweight Approach to Performance Portability with targetDP

Leading HPC systems achieve their status through use of highly parallel devices such as NVIDIA GPUs or Intel Xeon Phi many-core CPUs. The concept of performance portability across such architectures, as well as traditional CPUs, is vital for the application programmer. In this paper we describe targetDP, a lightweight abstraction layer which allows grid-based applications […]
Sep, 8

QSL Squasher: A Fast Quasi-Separatrix Layer Map Calculator

Quasi-Separatrix Layers (QSLs) are a useful proxy for the locations where current sheets can develop in the solar corona, and give valuable information about the connectivity in complicated magnetic field configurations. However, calculating QSL maps even for 2-dimensional slices through 3-dimensional models of coronal magnetic fields is a non-trivial task as it usually involves tracing […]
Page 18 of 904« First...10...1617181920...304050...Last »

* * *

* * *

TwitterAPIExchange Object
    [oauth_access_token:TwitterAPIExchange:private] => 301967669-yDz6MrfyJFFsH1DVvrw5Xb9phx2d0DSOFuLehBGh
    [oauth_access_token_secret:TwitterAPIExchange:private] => o29ji3VLVmB6jASMqY8G7QZDCrdFmoTvCDNNUlb7s
    [consumer_key:TwitterAPIExchange:private] => TdQb63pho0ak9VevwMWpEgXAE
    [consumer_secret:TwitterAPIExchange:private] => Uq4rWz7nUnH1y6ab6uQ9xMk0KLcDrmckneEMdlq6G5E0jlQCFx
    [postfields:TwitterAPIExchange:private] => 
    [getfield:TwitterAPIExchange:private] => ?cursor=-1&screen_name=hgpu&skip_status=true&include_user_entities=false
    [oauth:protected] => Array
            [oauth_consumer_key] => TdQb63pho0ak9VevwMWpEgXAE
            [oauth_nonce] => 1484853571
            [oauth_signature_method] => HMAC-SHA1
            [oauth_token] => 301967669-yDz6MrfyJFFsH1DVvrw5Xb9phx2d0DSOFuLehBGh
            [oauth_timestamp] => 1484853571
            [oauth_version] => 1.0
            [cursor] => -1
            [screen_name] => hgpu
            [skip_status] => true
            [include_user_entities] => false
            [oauth_signature] => fZixJqULb61JombJCZN87iEcJ04=

    [url] => https://api.twitter.com/1.1/users/show.json
Follow us on Facebook
Follow us on Twitter

HGPU group

2134 peoples are following HGPU @twitter

HGPU group © 2010-2017 hgpu.org

All rights belong to the respective authors

Contact us: