2683

Posts

Jan, 19

Implicit Parallel Time Integrators

In this work, we discuss a family of parallel implicit time integrators for multi-core and potentially multi-node or multi-gpgpu systems. The method is an extension of Revisionist Integral Deferred Correction (RIDC) by Christlieb, Macdonald and Ong (SISC-2010) which constructed parallel explicit time integrators. The key idea is to re-write the defect correction framework so that, […]
Jan, 19

Energy efficient biomolecular simulations with FPGA-based reconfigurable computing

Reconfigurable computing (RC) is being investigated as a hardware solution for improving time-to-solution for biomolecular simulations. A number of popular molecular dynamics (MD) codes are used to study various aspects of biomolecules. These codes are now capable of simulating nanosecond time-scale trajectories per day on conventional microprocessor-based hardware, but biomolecular processes often occur at the […]
Jan, 19

Towards microsecond biological molecular dynamics simulations on hybrid processors

Biomolecular simulations continue to become an increasingly important component of molecular biochemistry and biophysics investigations. Performance improvements in the simulations based on molecular dynamics (MD) codes are widely desired. This is particularly driven by the rapid growth of biological data due to improvements in experimental techniques. Unfortunately, the factors, which allowed past performance improvements of […]
Jan, 19

Optimal Utilization of Heterogeneous Resources for Biomolecular Simulations

Biomolecular simulations have traditionally benefited from increases in the processor clock speed and coarse-grain inter-node parallelism on large-scale clusters. With stagnating clock frequencies, the evolutionary path for performance of microprocessors is maintained by virtue of core multiplication. Graphical processing units (GPUs) offer revolutionary performance potential at the cost of increased programming complexity. Furthermore, it has […]
Jan, 19

Accelerating 3D Fourier migration with graphics processing units

Computational cost is a major factor that inhibits the practical application of 3D depth migration. We have developed a fast parallel scheme to speed up 3D wave-equation depth migration on a parallel computing device, i.e., on graphics processing units (GPUs). The third-order optimized generalized-screen propagator is used to take advantage of the built-in software implementation […]
Jan, 19

Introduction to the Report “Interlanguages and Synchronic Models of Computation.”

A novel language system has given rise to promising alternatives to standard formal and processor network models of computation. An interstring linked with a abstract machine environment, shares sub-expressions, transfers data, and spatially allocates resources for the parallel evaluation of dataflow. Formal models called the a-Ram family are introduced, designed to support interstring programming languages […]
Jan, 19

Space and the Synchronic A-Ram

Space is a circuit oriented, spatial programming language designed to exploit the massive parallelism available in a novel formal model of computation called the Synchronic A-Ram, and physically related FPGA and reconfigurable architectures. Space expresses variable grained MIMD parallelism, is modular, strictly typed, and deterministic. Barring operations associated with memory allocation and compilation, modules cannot […]
Jan, 19

Interlanguages and synchronic models of computation

A novel language system has given rise to promising alternatives to standard formal and processor network models of computation. An interstring linked with a abstract machine environment, shares sub-expressions, transfers data, and spatially allocates resources for the parallel evaluation of dataflow. Formal models called the a-Ram family are introduced, designed to support interstring programming languages […]
Jan, 19

Relational query coprocessing on graphics processors

Graphics processors (GPUs) have recently emerged as powerful coprocessors for general purpose computation. Compared with commodity CPUs, GPUs have an order of magnitude higher computation power as well as memory bandwidth. Moreover, new-generation GPUs allow writes to random memory locations, provide efficient interprocessor communication through on-chip local memory, and support a general purpose parallel programming […]
Jan, 18

Secure 3D graphics for virtual machines

In this paper a new approach to API remoting for GPU virtualisation is described which aims to reduce the amount of trusted code involved in 3D rendering for guest VMs. To achieve this it uses a modular driver framework to export large proportions of complex 3D graphics drivers into the guest’s domain. It further provides […]
Jan, 18

Message passing for GPGPU clusters: CudaMPI

We present and analyze two new communication libraries, cudaMPI and glMPI, that provide an MPI-like message passing interface to communicate data stored on the graphics cards of a distributed-memory parallel computer. These libraries can help applications that perform general purpose computations on these networked GPU clusters. We explore how to efficiently support both point-to-point and […]
Jan, 18

Blasting through lattice calculations using CUDA

Modern graphics hardware is designed for highly parallel numerical tasks and provides significant cost and performance benefits. Graphics hardware vendors are now making available development tools to support general purpose high performance computing. Nvidia’s CUDA platform, in particular, offers direct access to graphics hardware through a programming language similar to C. Using the CUDA platform […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: