Mar, 10

A Survey of Cache Partitioning Techniques for Multicore Processors

As the number of on-chip cores and memory demands of applications increase, judicious management of cache resources has become, not merely attractive, but even imperative. Cache partitioning, i.e. dividing cache space between applications based on their memory demands, is a promising approach to provide capacity benefits of shared cache with performance isolation of private caches. […]
Mar, 9

Architectural Principles and Experimentation of Distributed High Performance Virtual Clusters

With the advent of virtualization and Infrastructure-as-a-Service (IaaS), the broader scientific computing community is considering the use of clouds for their scientific computing needs. This is due to the relative scalability, ease of use, advanced user environment customization abilities, and the many novel computing paradigms available for data-intensive applications. However, a notable performance gap exists […]
Mar, 9

Achieving high-performance with a sparse direct solver on Intel KNL

The need for energy-efficient high-end systems has led hardware vendors to design new types of chips for general purpose computing. However, designing or porting a code tailored for these new types of processing units is often considered as a major hurdle for their broad adoption. In this paper, we consider a modern Intel Xeon Phi […]
Mar, 9

Optimizing Deep CNN-Based Queries over Video Streams at Scale

Video is one of the fastest-growing sources of data and is rich with interesting semantic information. Furthermore, recent advances in computer vision, in the form of deep convolutional neural networks (CNNs), have made it possible to query this semantic information with near-human accuracy (in the form of image tagging). However, performing inference with state-of-the-art CNNs […]
Mar, 9

A Machine-Learning Framework for Design for Manufacturability

Computer-aided Design for Manufacturing (DFM) systems play an important role in reducing the time taken for product development by providing manufacturability feedback to the designer while a component is being designed. Traditionally, DFM rules are hand-crafted and used to accelerate the engineering product design process by integrating manufacturability analysis during design. Such a practice relies […]
Mar, 9

Decoupled Block-Wise ILU(k) Preconditioner on GPU

This research investigates the implementation mechanism of block-wise ILU(k) preconditioner on GPU. The block-wise ILU(k) algorithm requires both the level k and the block size to be designed as variables. A decoupled ILU(k) algorithm consists of a symbolic phase and a factorization phase. In the symbolic phase, a ILU(k) nonzero pattern is established from the […]
Mar, 5

Wireless Interference Identification with Convolutional Neural Networks

The steadily growing use of license-free frequency bands requires reliable coexistence management for deterministic medium utilization. For interference mitigation, proper wireless interference identification (WII) is essential. In this work we propose the first WII approach based upon deep convolutional neural networks (CNNs). The CNN naively learns its features through self-optimization during an extensive data-driven GPU-based […]
Mar, 5

Multi-kernel Data Partitioning with Channel on OpenCL-based FPGAs

FPGAs have been widely used to accelerate relational database applications, due to their high throughput and high energy efficiency. However, hardware programmer needs to leverage hardware description languages (HDLs) to program FPGAs. Since HDL is cycle-sensitive and error-prone, deep knowledge about hardware design and hands-on experiences are required to guarantee a successful design on FPGA, […]
Mar, 5

Performance and Portability of Accelerated Lattice Boltzmann Applications with OpenACC

An increasingly large number of HPC systems rely on heterogeneous architectures combining traditional multi-core CPUs with power efficient accelerators. Designing efficient applications for these systems has been troublesome in the past as accelerators could usually be programmed using specific programming languages threatening maintainability, portability and correctness. Several new programming environments try to tackle this problem. […]
Mar, 5

Improving the Neural GPU Architecture for Algorithm Learning

Algorithm learning is a core problem in artificial intelligence with significant implications on automation level that can be achieved by machines. Recently deep learning methods are emerging for synthesizing an algorithm from its input-output examples, the most successful being the Neural GPU, capable of learning multiplication. We present several improvements to the Neural GPU that […]
Mar, 5

Billion-scale similarity search with GPUs

Similarity search finds application in specialized database systems handling complex data such as images or videos, which are typically represented by high-dimensional features and require specific indexing structures. This paper tackles the problem of better utilizing GPUs for this task. While GPUs excel at data-parallel tasks, prior approaches are bottlenecked by algorithms that expose less […]
Feb, 28

Speckle Reduction with Trained Nonlinear Diffusion Filtering

Speckle reduction is a prerequisite for many image processing tasks in synthetic aperture radar (SAR) images, as well as all coherent images. In recent years, predominant state-of-the-art approaches for despeckling are usually based on nonlocal methods which mainly concentrate on achieving utmost image restoration quality, with relatively low computational efficiency. Therefore, in this study we […]
Page 6 of 916« First...45678...203040...Last »

* * *

* * *

HGPU group © 2010-2017 hgpu.org

All rights belong to the respective authors

Contact us: