16560

Posts

Sep, 17

The CUDA LATCH Binary Descriptor: Because Sometimes Faster Means Better

Accuracy, descriptor size, and the time required for extraction and matching are all important factors when selecting local image descriptors. To optimize over all these requirements, this paper presents a CUDA port for the recent Learned Arrangement of Three Patches (LATCH) binary descriptors to the GPU platform. The design of LATCH makes it well suited […]
Sep, 17

Parallel Dynamics Computation using Prefix Sum Operations

We propose a new parallel framework for fast computation of inverse and forward dynamics of articulated robots based on prefix sums (scans). We re-investigate the well-known recursive Newton-Euler formulation of robot dynamics and show that the forward-backward propagation process for robot inverse dynamics is equivalent to two scan operations on certain semigroups. We show that […]
Sep, 17

A parallel pattern for iterative stencil + reduce

We advocate the Loop-of-stencil-reduce pattern as a means of simplifying the implementation of data-parallel programs on heterogeneous multi-core platforms. Loop-of-stencil-reduce is general enough to subsume map, reduce, map-reduce, stencil, stencil-reduce, and, crucially, their usage in a loop in both data-parallel and streaming applications, or a combination of both. The pattern makes it possible to deploy […]
Sep, 17

Efficient softmax approximation for GPUs

We propose an approximate strategy to efficiently train neural network based language models over very large vocabularies. Our approach, called adaptive softmax, circumvents the linear dependency on the vocabulary size by exploiting the unbalanced word distribution to form clusters that explicitly minimize the expectation of computational complexity. Our approach further reduces the computational cost by […]
Sep, 17

Finding faint HI structure in and around galaxies: scraping the barrel

Soon to be operational HI survey instruments such as APERTIF and ASKAP will produce large datasets. These surveys will provide information about the HI in and around hundreds of galaxies with a typical signal-to-noise ratio of ~10 in the inner regions and ~1 in the outer regions. In addition, such surveys will make it possible […]
Sep, 16

Agent-Based Modeling on High Performance Computing Architectures

In spatial agent-based models (SABMs) each entity of the system being modeled is uniquely represented as an independent agent. Large scale emergent behavior in SABMs is population sensitive. Thus, the number of agents should reflect the system being modeled, which can be in the order of billions. Models can be decomposed such that each component […]
Sep, 13

Data Analysis of Minimally-Structured Heterogeneous Logs: An experimental study of log template extraction and anomaly detection based on Recurrent Neural Network and Naive Bayes

Nowadays, the ideas of continuous integration and continuous delivery are under heavy usage in order to achieve rapid software development speed and quick product delivery to the customers with good quality. During the process ofmodern software development, the testing stage has always been with great significance so that the delivered software is meeting all the […]
Sep, 13

OpenMP as a High-Level Specification Language for Parallelism And its use in Evaluating Parallel Programming Systems

While OpenMP is the de facto standard of shared memory parallel programming models, a number of alternative programming models and runtime systems have arisen in recent years. Fairly evaluating these programming systems can be challenging and can require significant manual effort on the part of researchers. However, it is important to facilitate these comparisons as […]
Sep, 13

An efficient numerical method for solving the Boltzmann equation in multidimensions

In this paper we deal with the extension of the Fast Kinetic Scheme (FKS) [J. Comput. Phys., Vol. 255, 2013, pp 680-698] originally constructed for solving the BGK equation, to the more challenging case of the Boltzmann equation. The scheme combines a robust and fast method for treating the transport part based on an innovative […]
Sep, 13

Scheduling of Linear Algebra Kernels on Multiple Heterogeneous Resources

In this paper, we consider task-based dense linear algebra applications on a single heterogeneous node which contains regular CPU cores and a set of GPU devices. Efficient scheduling strategies are crucial in this context in order to achieve good and portable performance. HeteroPrio, a resource-centric dynamic scheduling strategy has been introduced in a previous work […]
Sep, 13

A New Architecture for Optimization Modeling Frameworks

We propose a new architecture for optimization modeling frameworks in which solvers are expressed as computation graphs in a framework like TensorFlow rather than as standalone programs built on a low-level linear algebra interface. Our new architecture makes it easy for modeling frameworks to support high performance computational platforms like GPUs and distributed clusters, as […]
Sep, 10

An Implementation of Real-Time Phased Array Radar Fundamental Functions on a DSP-Focused, High-Performance, Embedded Computing Platform

This paper investigates the feasibility of a backend design for real-time, multiple-channel processing digital phased array system, particularly for high-performance embedded computing platforms constructed of general purpose digital signal processors. First, we obtained the lab-scale backend performance benchmark from simulating beamforming, pulse compression, and Doppler filtering based on a Micro Telecom Computing Architecture (MTCA) chassis […]

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us: