high performance computing on graphics processing units: hgpu.org

Posts

Jul, 28

PASSATA – Object oriented numerical simulation software for adaptive optics

We present the last version of the PyrAmid Simulator Software for Adaptive opTics Arcetri (PASSATA), an IDL and CUDA based object oriented software developed in the Adaptive Optics group of the Arcetri observatory for Monte-Carlo end-to-end adaptive optics simulations. The original aim of this software was to evaluate the performance of a single conjugate adaptive […]

CUDA

Jul, 28

An Optimized Multiple Right-Hand Side Dslash Kernel for Intel Xeon Phi

Lattice quantum chromodynamics (LQCD) stands unique as the only computationally tractable, non-perturbative, and model-independent quantum field theory of the strong nuclear force. The computational core of LQCD is the Wilson Dslash operator, a nearest neighbor stencil operator summing matrix-vector multiplications over lattice points, whose performance is bandwidth-bound on most architectures. Reportedly, up to 90% of […]

Jul, 28

Genetic Improvement of GPU Software

We survey Genetic Improvement (GI) of general purpose computing on graphics cards. We summarise several experiments which demonstrate four themes. Experiments with the gzip program show that genetic programming (GP) can automatically port sequential C code to parallel code. Experiments with the StereoCamera program show that GI can upgrade legacy parallel code for new hardware […]

CUDA

Jul, 28

Real-Time Stochastic Kinodynamic Motion Planning via Multiobjective Search on GPUs

In this paper we present the PUMP (Parallel Uncertainty-aware Multiobjective Planning) algorithm for addressing the stochastic kinodynamic motion planning problem, whereby we seek a low-cost, dynamically-feasible motion plan subject to a constraint on collision probability (CP). As a departure from previous methods for chance-constrained motion planning, PUMP directly considers both CP and the optimization objective […]

CUDA

Jul, 26

Strategies for Protecting Intellectual Property when Using CUDA Applications on Graphics Processing Units

Recent advances in the massively parallel computational abilities of graphical processing units (GPUs) have increased their use for general purpose computation, as companies look to take advantage of big data processing techniques. This has given rise to the potential for malicious software targeting GPUs, which is of interest to forensic investigators examining the operation of […]

CUDA

Jul, 26

A Data Parallel Algorithm for Seismic Raytracing

Dijkstra’s single-source shortest path algorithm has been applied in seismic tomography to determine paths of minimum travel time from all locations in a 3D earth model to sensors used in seismic experiments. An iterative data parallel algorithm is formulated for seismic tomography based on the Bellman-Ford-Moore (BFM) algorithm. Performance is demonstrated for OpenMP and OpenCL.

OpenCL

Jul, 26

FPGA-Based Accelerator Design from a Domain-Specific Language

A large portion of image processing applications often come with stringent requirements regarding performance, energy efficiency, and power. FPGAs have proven to be among the most suitable architectures for algorithms that can be processed in a streaming pipeline. Yet, designing imaging systems for FPGAs remains a very time consuming task. High-Level Synthesis, which has significantly […]

OpenCL

Jul, 26

Gerbil: A Fast and Memory-Efficient k-mer Counter with GPU-Support

A basic task in bioinformatics is the counting of k-mers in genome strings. The k-mer counting problem is to build a histogram of all substrings of length k in a given genome sequence. We present the open source k-mer counting software Gerbil that has been designed for the efficient counting of k-mers for $kgeq32$. Given […]

CUDA

Jul, 26

GPU-accelleration of image rendering and sorting algorithms with the OpenCL framework

Today’s computer systems often contains several different processing units aside from the CPU. Among these the GPU is a very common processing unit with an immense compute power that is available in almost all computer systems. How do we make use of this processing power that lies within our machines? One answer is the OpenCL […]

OpenCL

Jul, 20

Algorithmic Trading: A brief, computational finance case study on data centre FPGAs

Increasingly FPGAs will be deployed at scale due to the need for increased need for power efficient computation and improved high level synthesis tool flows, creating a new category of device: data centre FPGAs. A method for using these FPGAs is to identify what proportion of a given workload would benefit from being implemented upon […]

OpenCL

Jul, 20

Lowering IrGL to CUDA

The IrGL intermediate representation is an explicitly parallel representation for irregular programs that targets GPUs. In this report, we describe IrGL constructs, examples of their use and how IrGL is compiled to CUDA by the Galois GPU compiler.

CUDA

Jul, 20

THOR: A New and Flexible Global Circulation Model to Explore Planetary Atmospheres

We have designed and developed, from scratch, a global circulation model named THOR that solves the three-dimensional non-hydrostatic Euler equations. Our general approach lifts the commonly used assumptions of a shallow atmosphere and hydrostatic equilibrium. We solve the "pole problem" (where converging meridians on a sphere lead to increasingly smaller time steps near the poles) […]

CUDA