17036

Posts

Mar, 9

Achieving high-performance with a sparse direct solver on Intel KNL

The need for energy-efficient high-end systems has led hardware vendors to design new types of chips for general purpose computing. However, designing or porting a code tailored for these new types of processing units is often considered as a major hurdle for their broad adoption. In this paper, we consider a modern Intel Xeon Phi […]
Mar, 9

Optimizing Deep CNN-Based Queries over Video Streams at Scale

Video is one of the fastest-growing sources of data and is rich with interesting semantic information. Furthermore, recent advances in computer vision, in the form of deep convolutional neural networks (CNNs), have made it possible to query this semantic information with near-human accuracy (in the form of image tagging). However, performing inference with state-of-the-art CNNs […]
Mar, 9

A Machine-Learning Framework for Design for Manufacturability

Computer-aided Design for Manufacturing (DFM) systems play an important role in reducing the time taken for product development by providing manufacturability feedback to the designer while a component is being designed. Traditionally, DFM rules are hand-crafted and used to accelerate the engineering product design process by integrating manufacturability analysis during design. Such a practice relies […]
Mar, 9

Decoupled Block-Wise ILU(k) Preconditioner on GPU

This research investigates the implementation mechanism of block-wise ILU(k) preconditioner on GPU. The block-wise ILU(k) algorithm requires both the level k and the block size to be designed as variables. A decoupled ILU(k) algorithm consists of a symbolic phase and a factorization phase. In the symbolic phase, a ILU(k) nonzero pattern is established from the […]
Mar, 5

Multi-kernel Data Partitioning with Channel on OpenCL-based FPGAs

FPGAs have been widely used to accelerate relational database applications, due to their high throughput and high energy efficiency. However, hardware programmer needs to leverage hardware description languages (HDLs) to program FPGAs. Since HDL is cycle-sensitive and error-prone, deep knowledge about hardware design and hands-on experiences are required to guarantee a successful design on FPGA, […]
Mar, 5

Wireless Interference Identification with Convolutional Neural Networks

The steadily growing use of license-free frequency bands requires reliable coexistence management for deterministic medium utilization. For interference mitigation, proper wireless interference identification (WII) is essential. In this work we propose the first WII approach based upon deep convolutional neural networks (CNNs). The CNN naively learns its features through self-optimization during an extensive data-driven GPU-based […]
Mar, 5

Improving the Neural GPU Architecture for Algorithm Learning

Algorithm learning is a core problem in artificial intelligence with significant implications on automation level that can be achieved by machines. Recently deep learning methods are emerging for synthesizing an algorithm from its input-output examples, the most successful being the Neural GPU, capable of learning multiplication. We present several improvements to the Neural GPU that […]
Mar, 5

Performance and Portability of Accelerated Lattice Boltzmann Applications with OpenACC

An increasingly large number of HPC systems rely on heterogeneous architectures combining traditional multi-core CPUs with power efficient accelerators. Designing efficient applications for these systems has been troublesome in the past as accelerators could usually be programmed using specific programming languages threatening maintainability, portability and correctness. Several new programming environments try to tackle this problem. […]
Mar, 5

Billion-scale similarity search with GPUs

Similarity search finds application in specialized database systems handling complex data such as images or videos, which are typically represented by high-dimensional features and require specific indexing structures. This paper tackles the problem of better utilizing GPUs for this task. While GPUs excel at data-parallel tasks, prior approaches are bottlenecked by algorithms that expose less […]
Feb, 28

Speckle Reduction with Trained Nonlinear Diffusion Filtering

Speckle reduction is a prerequisite for many image processing tasks in synthetic aperture radar (SAR) images, as well as all coherent images. In recent years, predominant state-of-the-art approaches for despeckling are usually based on nonlocal methods which mainly concentrate on achieving utmost image restoration quality, with relatively low computational efficiency. Therefore, in this study we […]
Feb, 28

An Efficient Multiway Mergesort for GPU Architectures

Sorting is a primitive operation that is a building block for countless algorithms. As such, it is important to design sorting algorithms that approach peak performance on a range of hardware architectures. Graphics Processing Units (GPUs) are particularly attractive architectures as they provides massive parallelism and computing power. However, the intricacies of their compute and […]
Feb, 28

Deep Voice: Real-time Neural Text-to-Speech

We present Deep Voice, a production-quality text-to-speech system constructed entirely from deep neural networks. Deep Voice lays the groundwork for truly end-to-end neural speech synthesis. The system comprises five major building blocks: a segmentation model for locating phoneme boundaries, a grapheme-to-phoneme conversion model, a phoneme duration prediction model, a fundamental frequency prediction model, and an […]
Page 12 of 922« First...1011121314...203040...Last »

* * *

* * *

HGPU group © 2010-2017 hgpu.org

All rights belong to the respective authors

Contact us: