16953

Posts

Feb, 2

Autotuning GPU Kernels via Static and Predictive Analysis

Optimizing the performance of GPU kernels is challenging for both human programmers and code generators. For example, CUDA programmers must set thread and block parameters for a kernel, but might not have the intuition to make a good choice. Similarly, compilers can generate working code, but may miss tuning opportunities by not targeting GPU models […]
Jan, 31

CFP: Fifth International Workshop on OpenCL (IWOCL 2017) – EXTENDED

Now in its fifth year, the International Workshop on OpenCL (IWOCL) will be hosted by The University of Toronto, Canada, at the Bahen Centre on May 16th-18th 2017. May 16th sees two activities: an Advanced Hands On OpenCL tutorial and a SYCL workshop, while May 17th and 18th will include of a mix of keynotes, […]
Jan, 26

Parallel Implementations of the Cholesky Decomposition on CPUs and GPUs

As Central Processing Units (CPUs) and Graphical Processing Units (GPUs) get progressively better, different approaches and designs for implementing algorithms with high data load must be studied and compared. This work compares several different algorithm designs and parallelization APIs (such as OpenMP, OpenCL and CUDA) for both CPU and GPU platforms. We used the Cholesky […]
Jan, 26

Accelerating Workloads on FPGAs via OpenCL: A Case Study with OpenDwarfs

For decades, the streaming architecture of FPGAs has delivered accelerated performance across many application domains, such as option pricing solvers in finance, computational fluid dynamics in oil and gas, and packet processing in network routers and firewalls. However, this performance comes at the expense of programmability. FPGA developers use hardware design languages (HDLs) to implement […]
Jan, 26

Deep Convolutional Network evaluation on the Intel Xeon Phi: Where Subword Parallelism meets Many-Core

With a sharp decline in camera cost and size along with superior computing power available at increasingly low prices, computer vision applications are becoming ever present in our daily lives. Research shows that Convolutional Neural Networks (ConvNet) can outperform all other methods for computer vision tasks (such as object detection) in terms of accuracy and […]
Jan, 26

A GPU-Based Solution to Fast Calculation of Betweenness Centrality on Large Weighted Networks

Recent decades have witnessed the tremendous development of network science, which indeed brings a new and insightful language to model real systems of different domains. Betweenness, a widely employed centrality in network science, is a decent proxy in investigating network loads and rankings. However, the extremely high computational cost greatly prevents its applying on large […]
Jan, 26

Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer

The capacity of a neural network to absorb information is limited by its number of parameters. Conditional computation, where parts of the network are active on a per-example basis, has been proposed in theory as a way of dramatically increasing model capacity without a proportional increase in computation. In practice, however, there are significant algorithmic […]
Jan, 23

Astrophysical-oriented Computational multi-Architectural Framework

This work presents the framework for simplifying software development in the astrophysical simulations branch – Astrophysical-oriented Computational multi-Architectural Framework (ACAF). The astrophysical simulation problems are usually approximated with the particle systems for computational purposes. The number of particles in such approximations reaches several millions, which enforces the usage of the computer clusters for the simulations. […]
Jan, 23

DeepBach: a Steerable Model for Bach chorales generation

The composition of polyphonic chorale music in the style of J.S Bach has represented a major challenge in automatic music composition over the last decades. The art of Bach chorales composition involves combining four-part harmony with characteristic rhythmic patterns and typical melodic movements to produce musical phrases which begin, evolve and end (cadences) in a […]
Jan, 23

A task-driven implementation of a simple numerical solver for hyperbolic conservation laws

This article describes the implementation of an all-in-one numerical procedure within the runtime StarPU. In order to limit the complexity of the method, for the sake of clarity of the presentation of the non-classical task-driven programming environnement, we have limited the numerics to first order in space and time. Results show that the task distribution […]
Jan, 23

GPGPU Performance Estimation with Core and Memory Frequency Scaling

Graphics Processing Units (GPUs) support dynamic voltage and frequency scaling (DVFS) in order to balance computational performance and energy consumption. However, there still lacks simple and accurate performance estimation of a given GPU kernel under different frequency settings on real hardware, which is important to decide best frequency configuration for energy saving. This paper reveals […]
Jan, 23

Multi-core parallelism in a column-store

The research reported in this thesis addresses several challenges of improving the efficiency and effectiveness of parallel processing of analytical database queries on modern multi- and many-core systems, using an open-source column-oriented analytical database management system, MonetDB, for validation. In contrast to the existing work we also broaden the research from focusing on individual operators […]
Page 11 of 916« First...910111213...203040...Last »

* * *

* * *

HGPU group © 2010-2017 hgpu.org

All rights belong to the respective authors

Contact us: