7789

Posts

Jun, 5

Using visualization to reveal weak cryptosystems

My thesis explains how we can apply techniques borrowed from the area of visualization to reveal weaknesses in applications and cryptosystems. A presentation of how graphical processing units can be used for general computing is presented in the first half of the thesis. The second half provides an overview of basic techniques and applies these […]
Jun, 5

Hierarchical Partitioning Algorithm for Scientific Computing on Highly Heterogeneous CPU + GPU Clusters

Hierarchical level of heterogeneity exists in many modern high performance clusters in the form of heterogeneity between computing nodes, and within a node with the addition of specialized accelerators, such as GPUs. To achieve high performance of scientific applications on these platforms it is necessary to perform load balancing. In this paper we present a […]
Jun, 5

Shortening design time through multiplatform simulations with a portable OpenCL golden-model: the LDPC decoder case

Hardware designers and engineers typically need to explore a multi-parametric design space in order to find the best configuration for their designs using simulations that can take weeks to months to complete. For example, designers of special purpose chips need to explore parameters such as the optimal bit width and data representation. This is the […]
Jun, 5

Landau Gauge Fixing on GPUs

In this paper we present and explore the performance of Landau gauge fixing in GPUs using CUDA. We consider the steepest descent algorithm with Fourier acceleration, and compare the GPU performance with a parallel CPU implementation. Using $32^4$ lattice volumes, we find that the computational power of a single Tesla C2070 GPU is equivalent to […]
Jun, 4

Platform 2012, a Many-Core Computing Accelerator for Embedded SoCs: Performance Evaluation of Visual Analytics Applications

P2012 is an area- and power-efficient many-core computing accelerator based on multiple globally asynchronous, locally synchronous processor clusters. Each cluster features up to 16 processors with independent instruction streams sharing a multi-banked one-cycle access L1 data memory, a multi-channel DMA engine and specialized hardware for synchronization and aggressive power management. P2012 is 3D stacking ready […]
Jun, 4

A Compiler and Runtime for Heterogeneous Computing

Heterogeneous systems show a lot of promise for extracting high-performance by combining the benefits of conventional architectures with specialized accelerators in the form of graphics processors (GPUs) and reconfigurable hardware (FPGAs). Extracting this performance often entails programming in disparate languages and models, making it hard for a programmer to work equally well on all aspects […]
Jun, 4

Finite Element Matrix Generation on a GPU

This paper presents an efficient technique for fast generation of sparse systems of linear equations arising in computational electromagnetics in a finite element method using higher order elements. The proposed approach employs a graphics processing unit (GPU) for both numerical integration and matrix assembly. The performance results obtained on a test platform consisting of a […]
Jun, 4

Pipelining the Fast Multipole Method over a Runtime System

Fast Multipole Methods (FMM) are a fundamental operation for the simulation of many physical problems. The high performance design of such methods usually requires to carefully tune the algorithm for both the targeted physics and the hardware. In this paper, we propose a new approach that achieves high performance across architectures. Our method consists of […]
Jun, 4

High Accuracy Gravitational Waveforms from Black Hole Binary Inspirals Using OpenCL

There is a strong need for high-accuracy and efficient modeling of extreme-mass-ratio binary black hole systems because these are strong sources of gravitational waves that would be detected by future observatories. In this article, we present sample results from our Teukolsky EMRI code: a time-domain Teukolsky equation solver (a linear, hyperbolic, partial differential equation solver […]
Jun, 3

GPU Join Processing Revisited

Until recently, the use of graphics processing units (GPUs) for query processing was limited by the amount of memory on the graphics card, a few gigabytes at best. Moreover, input tables had to be copied to GPU memory before they could be processed, and after computation was completed, query results had to be copied back […]
Jun, 3

Parallel Triangular Solvers on GPU

In this paper, we investigate GPU based parallel triangular solvers systematically. The parallel triangular solvers are fundamental to incomplete LU factorization family preconditioners and algebraic multigrid solvers. We develop a new matrix format suitable for GPU devices. Parallel lower triangular solver and upper triangular solver are developed for this new data structure. With these solvers, […]
Jun, 3

Parallel Agent systems on a GPU for use with Simulations and Games

In this paper we describe a parallel agent based computing system. The agents are placed on GPU memory and executed in parallel on the GPU. We discuss the difficulties in creating this system and provide solutions to each of the problems encountered. We then go on to describe a test bed application for the implementation […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: