Posts
Jun, 4
Platform 2012, a Many-Core Computing Accelerator for Embedded SoCs: Performance Evaluation of Visual Analytics Applications
P2012 is an area- and power-efficient many-core computing accelerator based on multiple globally asynchronous, locally synchronous processor clusters. Each cluster features up to 16 processors with independent instruction streams sharing a multi-banked one-cycle access L1 data memory, a multi-channel DMA engine and specialized hardware for synchronization and aggressive power management. P2012 is 3D stacking ready […]
Jun, 4
A Compiler and Runtime for Heterogeneous Computing
Heterogeneous systems show a lot of promise for extracting high-performance by combining the benefits of conventional architectures with specialized accelerators in the form of graphics processors (GPUs) and reconfigurable hardware (FPGAs). Extracting this performance often entails programming in disparate languages and models, making it hard for a programmer to work equally well on all aspects […]
Jun, 4
Finite Element Matrix Generation on a GPU
This paper presents an efficient technique for fast generation of sparse systems of linear equations arising in computational electromagnetics in a finite element method using higher order elements. The proposed approach employs a graphics processing unit (GPU) for both numerical integration and matrix assembly. The performance results obtained on a test platform consisting of a […]
Jun, 4
Pipelining the Fast Multipole Method over a Runtime System
Fast Multipole Methods (FMM) are a fundamental operation for the simulation of many physical problems. The high performance design of such methods usually requires to carefully tune the algorithm for both the targeted physics and the hardware. In this paper, we propose a new approach that achieves high performance across architectures. Our method consists of […]
Jun, 4
High Accuracy Gravitational Waveforms from Black Hole Binary Inspirals Using OpenCL
There is a strong need for high-accuracy and efficient modeling of extreme-mass-ratio binary black hole systems because these are strong sources of gravitational waves that would be detected by future observatories. In this article, we present sample results from our Teukolsky EMRI code: a time-domain Teukolsky equation solver (a linear, hyperbolic, partial differential equation solver […]
Jun, 3
GPU Join Processing Revisited
Until recently, the use of graphics processing units (GPUs) for query processing was limited by the amount of memory on the graphics card, a few gigabytes at best. Moreover, input tables had to be copied to GPU memory before they could be processed, and after computation was completed, query results had to be copied back […]
Jun, 3
Parallel Triangular Solvers on GPU
In this paper, we investigate GPU based parallel triangular solvers systematically. The parallel triangular solvers are fundamental to incomplete LU factorization family preconditioners and algebraic multigrid solvers. We develop a new matrix format suitable for GPU devices. Parallel lower triangular solver and upper triangular solver are developed for this new data structure. With these solvers, […]
Jun, 3
Parallel Agent systems on a GPU for use with Simulations and Games
In this paper we describe a parallel agent based computing system. The agents are placed on GPU memory and executed in parallel on the GPU. We discuss the difficulties in creating this system and provide solutions to each of the problems encountered. We then go on to describe a test bed application for the implementation […]
Jun, 3
Reservoir Simulation on NVIDIA Tesla GPUs
In this paper, we introduce our work on accelerating a black oil simulator using GPU-based parallel iterative linear solvers. We develop iterative linear solvers and several commonly used preconditioners on NVIDIA Tesla GPUs. These solvers and preconditioners are coupled with our in-house reservoir simulator. Numerical experiments show that our GPU-based black oil simulator is sped […]
Jun, 3
Ultra-Fast Displaying Spectral Domain Optical Doppler Tomography System Using a Graphics Processing Unit
We demonstrate an ultrafast displaying Spectral Domain Optical Doppler Tomography system using Graphics Processing Unit (GPU) computing. The calculation of FFT and the Doppler frequency shift is accelerated by the GPU. Our system can display processed OCT and ODT images simultaneously in real time at 120 fps for 1,024 pixels x 512 lateral A-scans. The […]
Jun, 1
Towards Distributed Heterogenous High-Performance Computing with ViennaCL
One of the major drawbacks of computing with graphics adapters is the limited available memory for relevant problem sizes. To overcome this limitation for the ViennaCL library, we investigate a partitioning approach for one of the standard benchmark problems in High-Performance Computing (HPC), namely the dense matrix-matrix product. We apply this partitioning approach to problems […]
Jun, 1
MPI-ACC: An Integrated and Extensible Approach to Data Movement in Accelerator-Based Systems
Data movement in high-performance computing systems accelerated by graphics processing units (GPUs) remains a challenging problem. Data communication in popular parallel programming models, such as the Message Passing Interface (MPI), is currently limited to the data stored in the CPU memory space. Auxiliary memory systems, such as GPU memory, are not integrated into such data […]

