10651

Posts

Sep, 30

Data-parallel Acceleration of PARSEC Black-Scholes Benchmark

The way programmers has been relying on processor improvements to gain speedup in their applications is no longer applicable in the same fashion. Programmers usually have to parallelize their code to utilize the CPU cores in the system to gain a significant speedup. To accelerate parallel applications furthermore there are a couple of techniques available. […]
Sep, 30

Approximate dynamic programming with post-decision states as a solution method for dynamic economic models

I introduce and evaluate a new stochastic simulation method for dynamic economic models. It is based on recent work in the operations research and engineering literatures (Van Roy et. al, 1997; Powell, 2007; Bertsekas, 2011). The baseline method involves rewriting the household’s dynamic program in terms of post-decision states. This makes it possible to choose […]
Sep, 30

A GPU cluster optimized multigrid scheme for computing unsteady incompressible fluid flow

A multigrid scheme has been proposed that allows efficient implementation on modern CPUs, many integrated core devices (MICs), and graphics processing units (GPUs). It is shown that wide single instruction multiple data (SIMD) processing engines are used efficiently when a deep, 2h grid hierarchy is replaced with a two level scheme using 16h-32h restriction. The […]
Sep, 29

Adapting data processing methods to modern GPU architecture

Wavelet transform have a wide area of application in many scientific areas, for example signal processing, image compression [6] or data mining [4] [5]. Present requirements demand preforming large amount of calculations in the minimum time. For that reason the goal of this paper is to present an approach that will fulfill mentioned requirements, by […]
Sep, 29

Separate Compilation in a Language-Integrated Heterogeneous Environment

Heterogeneous computing platforms are becoming more common in recent years. Effective programming languages and tools will play a key role in unlocking the performance potential of these systems. In this paper, we present the design and implementation of separate compilation and linking support for the CUDA programming platform. CUDA provides a language-integrated environment for writing […]
Sep, 29

Evaluation of disconnected quark loops for hadron structure using GPUs

A number of stochastic methods developed for the calculation of fermion loops are investigated and compared, in particular with respect to their efficiency when implemented on Graphics Processing Units (GPUs). We assess the performance of the various methods by studying the convergence and statistical accuracy obtained for observables that require a large number of stochastic […]
Sep, 29

The Complete Rank Transform: A Tool for Accurate and Morphologically Invariant Matching of Structures

Most researchers agree that invariances are desirable in computer vision systems. However, one always has to keep in mind that this is at the expense of accuracy: By construction, all invariances inevitably discard information. The concept of morphological invariance is a good example for this trade-off and will be in the focus of this paper. […]
Sep, 29

Optimizing Urban Environmental Simulations using Boinc

Urban cities are usually densely populated and have massive infrastructure. They consume a lot of energy and generate pollution. Urban form and structure interact with the environment in a complex way. There is transfer of energy between buildings and the ground layer. Winds flow through the urban street canyons, affecting evaporation, temperature and pollution dispersion. […]
Sep, 29

Re-Introduction of Communication-Avoiding FMM-Accelerated FFTs with GPU Acceleration

As distributed memory systems grow larger, communication demands have increased. Unfortunately, while the costs of arithmetic operations continue to decrease rapidly, communication costs have not. As a result, there has been a growing interest in communication-avoiding algorithms for some of the classic problems in numerical computing, including communication-avoiding Fast Fourier Transforms (FFTs). A previously-developed low-communication […]
Sep, 28

Toward a GPU-Accelerated Immersed Boundary Method for Wind Forecasting Over Complex Terrain

A short-term wind power forecasting capability can be a valuable tool in the renewable energy industry to address load-balancing issues that arise from intermittent wind fields. Although numerical weather prediction models have been used to forecast winds, their applicability to micro-scale atmospheric boundary layer flows and ability to predict wind speeds at turbine hub height […]
Sep, 28

APOGEE: adaptive prefetching on GPUs for energy efficiency

Modern graphics processing units (GPUs) combine large amounts of parallel hardware with fast context switching among thousands of active threads to achieve high performance. However, such designs do not translate well to mobile environments where power constraints often limit the amount of hardware. In this work, we investigate the use of prefetching as a means […]
Sep, 28

A GPU Implementation of a Jacobi Method for Lattice Basis Reduction

This paper describes a parallel Jacobi method for lattice basis reduction and a GPU implementation using CUDA. Our experiments have shown that the parallel implementation is more than fifty times as fast as the serial counterpart, which is about twice as fast as the well-known LLL lattice reduction algorithm.

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us: