17061

Posts

Mar, 14

Compiling Parallel Functional Code with Data Parallel Idealised Algol

Graphics Processing Units (GPUs) and other parallel devices are widely available and have the potential for accelerating a wide class of algorithms. However, expert programming skills are required to achieve maximum performance. These devices expose low-level hardware details through imperative programming interfaces which inevitably results in non-performanceportable programs highly tuned for a specific device. Functional […]
Mar, 14

Large-scale image analysis using docker sandboxing

With the advent of specialized hardware such as Graphics Processing Units (GPUs), large scale image localization, classification and retrieval have seen increased prevalence. Designing scalable software architecture that co-evolves with such specialized hardware is a challenge in the commercial setting. In this paper, we describe one such architecture (Cortexica) that leverages scalability of GPUs and […]
Mar, 14

Massive Exploration of Neural Machine Translation Architectures

Neural Machine Translation (NMT) has shown remarkable progress over the past few years with production systems now being deployed to end-users. One major drawback of current architectures is that they are expensive to train, typically requiring days to weeks of GPU time to converge. This makes exhaustive hyperparameter search, as is commonly done with other […]
Mar, 14

Model-independent partial wave analysis using a massively-parallel fitting framework

The functionality of GooFit, a GPU-friendly framework for doing maximum-likelihood fits, has been extended to extract model-independent S-wave amplitudes in three-body decays such as $D^+ to h^+h^+h^-$. A full amplitude analysis is done where the magnitudes and phases of the S-wave amplitudes are anchored at a finite number of $m^2(h^+h^-)$ control points, and a cubic […]
Mar, 10

A Survey of Cache Partitioning Techniques for Multicore Processors

As the number of on-chip cores and memory demands of applications increase, judicious management of cache resources has become, not merely attractive, but even imperative. Cache partitioning, i.e. dividing cache space between applications based on their memory demands, is a promising approach to provide capacity benefits of shared cache with performance isolation of private caches. […]
Mar, 9

Architectural Principles and Experimentation of Distributed High Performance Virtual Clusters

With the advent of virtualization and Infrastructure-as-a-Service (IaaS), the broader scientific computing community is considering the use of clouds for their scientific computing needs. This is due to the relative scalability, ease of use, advanced user environment customization abilities, and the many novel computing paradigms available for data-intensive applications. However, a notable performance gap exists […]
Mar, 9

Achieving high-performance with a sparse direct solver on Intel KNL

The need for energy-efficient high-end systems has led hardware vendors to design new types of chips for general purpose computing. However, designing or porting a code tailored for these new types of processing units is often considered as a major hurdle for their broad adoption. In this paper, we consider a modern Intel Xeon Phi […]
Mar, 9

Optimizing Deep CNN-Based Queries over Video Streams at Scale

Video is one of the fastest-growing sources of data and is rich with interesting semantic information. Furthermore, recent advances in computer vision, in the form of deep convolutional neural networks (CNNs), have made it possible to query this semantic information with near-human accuracy (in the form of image tagging). However, performing inference with state-of-the-art CNNs […]
Mar, 9

A Machine-Learning Framework for Design for Manufacturability

Computer-aided Design for Manufacturing (DFM) systems play an important role in reducing the time taken for product development by providing manufacturability feedback to the designer while a component is being designed. Traditionally, DFM rules are hand-crafted and used to accelerate the engineering product design process by integrating manufacturability analysis during design. Such a practice relies […]
Mar, 9

Decoupled Block-Wise ILU(k) Preconditioner on GPU

This research investigates the implementation mechanism of block-wise ILU(k) preconditioner on GPU. The block-wise ILU(k) algorithm requires both the level k and the block size to be designed as variables. A decoupled ILU(k) algorithm consists of a symbolic phase and a factorization phase. In the symbolic phase, a ILU(k) nonzero pattern is established from the […]
Mar, 5

Wireless Interference Identification with Convolutional Neural Networks

The steadily growing use of license-free frequency bands requires reliable coexistence management for deterministic medium utilization. For interference mitigation, proper wireless interference identification (WII) is essential. In this work we propose the first WII approach based upon deep convolutional neural networks (CNNs). The CNN naively learns its features through self-optimization during an extensive data-driven GPU-based […]
Mar, 5

Multi-kernel Data Partitioning with Channel on OpenCL-based FPGAs

FPGAs have been widely used to accelerate relational database applications, due to their high throughput and high energy efficiency. However, hardware programmer needs to leverage hardware description languages (HDLs) to program FPGAs. Since HDL is cycle-sensitive and error-prone, deep knowledge about hardware design and hands-on experiences are required to guarantee a successful design on FPGA, […]

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us:

contact@hpgu.org