7977

Posts

Jul, 12

High-Performance Symmetric Block Ciphers on Multicore CPU and GPUs

As the data protection with encryption becomes important day by day, the encryption processing using General Purpose computation on a Graphic Processing Unit (GPGPU) has been noticed as one of the methods to realize high-speed data protection technology. GPUs have evolved in recent years into powerful parallel computing devices, with a high cost-performance ratio. However, […]
Jul, 12

A Note on Particle Filters Applied to DSGE Models

This paper compares the properties of two particle filters – the Bootstrap Filter and the Auxiliary Particle Filter – applied to the computation of the likelihood of artificial data simulated from a basic DSGE model with nominal and real rigidities. Particle filters are compared in terms of speed, quality of the approximation of the probability […]
Jul, 12

Data Partitioning on Heterogeneous Multicore and Multi-GPU Systems Using Functional Performance Models of Data-Parallel Applications

Transition to hybrid CPU/GPU platforms in high performance computing is challenging in the aspect of efficient utilisation of the heterogeneous hardware and existing optimised software. During recent years, scientific software has been ported to multicore and GPU architectures and now should be reused on hybrid platforms. In this paper, we model the performance of such […]
Jul, 11

Fast Algorithms for the Solution of Stochastic Partial Differential Equations

We explore the performance of several algorithms for the solution of stochastic partial differential equations including the stochastic Galerkin method and the stochastic sparse grid collocation method. We also introduce a new method called the adaptive kernel density estimation (KDE) collocation method, which addresses some of the deficiencies present in other stochastic PDE solution methods. […]
Jul, 11

Stencil-Aware GPU Optimization of Iterative Solvers

Numerical solutions of nonlinear partial differential equations frequently rely on iterative Newton-Krylov methods, which linearize a finite-difference stencil-based discretization of a problem, producing a sparse matrix with regular structure. Knowledge of this structure can be used to exploit parallelism and locality of reference on modern cache-based multi and many-core architectures, achieving high performance for computations […]
Jul, 11

Invitation to a Standard Programming Interface for Massively Parallel Computing Environment: OpenCL

Multicore/manycore architecture accelerates demand for a new programming environment to utilize the massive processors integrated in an LSI. GPU (Graphics Processing Unit) is one of the typical hardware environments. The programming environments on GPU are traditionally vendor-/hardware-specific, where complicate the management of uniform programs that access computing resources of the massively parallel platform. The recently […]
Jul, 11

Geometric Algebra enhanced Precompiler for C++ and OpenCL

The focus of the this work is a simplified integration of algorithms expressed in Geometric Algebra (GA) in modern high level computer languages, namely C++, OpenCL and CUDA. A high runtime performance in terms of GA is achieved using symbolic simplification and code generation by a Precompiler that is directly integrated into CMake-based build toolchains.
Jul, 11

A fully parallel, high precision, N-body code running on hybrid computing platforms

We present a new implementation of the numerical integration of the classical, gravitational, N-body problem based on a high order Hermite’s integration scheme with block time steps, with a direct evaluation of the particle-particle forces. The main innovation of this code (called HiGPUs) is its full parallelization, exploiting both OpenMP and MPI in the use […]
Jul, 11

Hybrid Monte Carlo with Wilson Dirac operator on the Fermi GPU

In this article we present our implementation of a Hybrid Monte Carlo algorithm for Lattice Gauge Theory using two degenerate flavours of Wilson-Dirac fermions on a Fermi GPU. We find that using registers instead of global memory speeds up the code by almost an order of magnitude. To map the array variables to scalars, so […]
Jul, 10

Exposure Render: An Interactive Photo-Realistic Volume Rendering Framework

The field of volume visualization has undergone rapid development during the past years, both due to advances in suitable computing hardware and due to the increasing availability of large volume datasets. Recent work has focused on increasing the visual realism in Direct Volume Rendering (DVR) by integrating a number of visually plausible but often effect-specific […]
Jul, 10

Multi-level Parallelization of Advanced Video Coding on Hybrid CPU/GPU Platform

In this paper we propose a dynamic model for parallel H.264/AVC video encoding on hybrid GPU/CPU systems. Entire inter-loop is parallelized on both CPU and GPU and computationally light and efficient model is proposed to dynamically distribute computation load among simultaneously processing devices. This model includes both dependency aware task scheduling and load balancing algorithm […]
Jul, 10

Runtime Systems and Scheduling Support for High-End CPU-GPU Architectures

In recent years, multi-core CPUs and many-core GPUs have emerged as mainstream and cost-effective means for scaling. Consequently, a trend that is receiving wide attention is of heterogeneous computing platforms consisting of both CPU and GPU. Such heterogeneous architectures are pervasive across notebooks, desktops, clusters, supercomputers and cloud environments. While they expose huge potential for […]

Recent source codes

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us: