16939

Posts

Jan, 26

Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer

The capacity of a neural network to absorb information is limited by its number of parameters. Conditional computation, where parts of the network are active on a per-example basis, has been proposed in theory as a way of dramatically increasing model capacity without a proportional increase in computation. In practice, however, there are significant algorithmic […]
Jan, 23

Astrophysical-oriented Computational multi-Architectural Framework

This work presents the framework for simplifying software development in the astrophysical simulations branch – Astrophysical-oriented Computational multi-Architectural Framework (ACAF). The astrophysical simulation problems are usually approximated with the particle systems for computational purposes. The number of particles in such approximations reaches several millions, which enforces the usage of the computer clusters for the simulations. […]
Jan, 23

DeepBach: a Steerable Model for Bach chorales generation

The composition of polyphonic chorale music in the style of J.S Bach has represented a major challenge in automatic music composition over the last decades. The art of Bach chorales composition involves combining four-part harmony with characteristic rhythmic patterns and typical melodic movements to produce musical phrases which begin, evolve and end (cadences) in a […]
Jan, 23

GPGPU Performance Estimation with Core and Memory Frequency Scaling

Graphics Processing Units (GPUs) support dynamic voltage and frequency scaling (DVFS) in order to balance computational performance and energy consumption. However, there still lacks simple and accurate performance estimation of a given GPU kernel under different frequency settings on real hardware, which is important to decide best frequency configuration for energy saving. This paper reveals […]
Jan, 23

A task-driven implementation of a simple numerical solver for hyperbolic conservation laws

This article describes the implementation of an all-in-one numerical procedure within the runtime StarPU. In order to limit the complexity of the method, for the sake of clarity of the presentation of the non-classical task-driven programming environnement, we have limited the numerics to first order in space and time. Results show that the task distribution […]
Jan, 23

Multi-core parallelism in a column-store

The research reported in this thesis addresses several challenges of improving the efficiency and effectiveness of parallel processing of analytical database queries on modern multi- and many-core systems, using an open-source column-oriented analytical database management system, MonetDB, for validation. In contrast to the existing work we also broaden the research from focusing on individual operators […]
Jan, 19

Deep Learning for Computational Chemistry

The rise and fall of artificial neural networks is well documented in the scientific literature of both computer science and computational chemistry. Yet almost two decades later, we are now seeing a resurgence of interest in deep learning, a machine learning algorithm based on multilayer neural networks. Within the last few years, we have seen […]
Jan, 19

OpenNMT: Open-Source Toolkit for Neural Machine Translation

We describe an open-source toolkit for neural machine translation (NMT). The toolkit prioritizes efficiency, modularity, and extensibility with the goal of supporting NMT research into model architectures, feature representations, and source modalities, while maintaining competitive performance and reasonable training requirements. The toolkit consists of modeling and translation support, as well as detailed pedagogical documentation about […]
Jan, 19

Xeon Phi: A comparison between the newly introduced MIC architecture and a standard CPU through three types of problems

As Moore s law continues, processors keep getting more cores packed together on the chip. This thesis is an empirical study of the rather newly introduced Intel Many Integrated Core (IMIC) architecture found in the Intel Xeon Phi. With roughly 60 cores connected by a high performance on-die interconnect, the Intel Xeon Phi makes an […]
Jan, 19

Parallelizing Exact and Approximate String Matching via Inclusive Scan on a GPU

In this study, to substantially improve the runtimes of exact and approximate string matching algorithms, we propose a tribrid parallel method for bit-parallel algorithms such as the Shift-Or and Wu-Manber algorithms. Our underlying idea is to interpret bit-parallel algorithms as inclusive-scan operations, which allow these bit-parallel algorithms to run efficiently on a graphics processing unit […]
Jan, 19

Light Loss-Less Data Compression, with GPU Implementation

There is no doubt that data compression is very important in computer engineering. However, most lossless data compression and decompression algorithms are very hard to parallelize, because they use dictionaries updated sequentially. The main contribution of this paper is to present a new lossless data compression method that we call Light Loss-Less (LLL) compression. It […]
Jan, 16

An OpenCL(TM) Deep Learning Accelerator on Arria 10

Convolutional neural nets (CNNs) have become a practical means to perform vision tasks, particularly in the area of image classification. FPGAs are well known to be able to perform convolutions efficiently, however, most recent efforts to run CNNs on FPGAs have shown limited advantages over other devices such as GPUs. Previous approaches on FPGAs have […]
Page 20 of 924« First...10...1819202122...304050...Last »

* * *

* * *

HGPU group © 2010-2017 hgpu.org

All rights belong to the respective authors

Contact us: