2253

Posts

Dec, 17

The Accelerated Universe

The advent of powerful cosmological surveys demands a new generation of high-precision, large-volume, and high dynamic range simulations of structure formation in the Universe. Key aims of these simulations are understanding why the expansion of the Universe is accelerating and what dark matter is made of. The availability of Roadrunner, the world’s first petaflop platform, […]
Dec, 16

Accelerating Monte Carlo simulations with an NVIDIA graphics processor

Modern graphics cards, commonly used in desktop computers, have evolved beyond a simple interface between processor and display to incorporate sophisticated calculation engines that can be applied to general purpose computing. The Monte Carlo algorithm for modelling photon transport in turbid media has been implemented on an NVIDIA 8800GT graphics card using the CUDA toolkit. […]
Dec, 16

OpenMM: A Hardware-Independent Framework for Molecular Simulations

The wide diversity of computer architectures today requires a new approach to software development. OpenMM is an abstraction layer for molecular mechanics simulations, allowing a single program to run efficiently on a variety of hardware platforms.
Dec, 16

A massively multicore parallelization of the Kohn-Sham energy gradients

In a previous article [Brown et al., J Chem Theory Comput 2009, 4, 1620], we described a quadrature-based formulation of the Kohn-Sham Coulomb problem that allows for efficient parallelization over thousands of small processor cores. Here, we present the analytic gradients of this modified Kohn-Sham scheme, and describe the parallel implementation of the gradients on […]
Dec, 16

Real-time optical micro-manipulation using optimized holograms generated on the GPU

Holographic optical tweezers allow the three-dimensional, dynamic, multipoint manipulation of micron sized objects using laser light. Exploiting the massive parallel architecture of modern GPUs we can generate highly optimized holograms at video frame-rate allowing the precise interactive micro-manipulation of complex structures.
Dec, 16

Revolutionary technologies for acceleration of emerging petascale applications

As we enter the era of billion transistor chips, computer architects face significant challenges in effectively harnessing the large amount of computational potential available in modern CMOS technology. Chip designers have been moving away from maximizing single-thread performance via exponential scaling of clock frequencies toward chip multiprocessors (CMPs) in order to better manage trade-offs among […]
Dec, 16

GPU Acceleration of an Unmodified Parallel Finite Element Navier-Stokes Solver

We have previously suggested a minimally invasive approach to include hardware accelerators into an existing large-scale parallel finite element PDE solver toolkit, and implemented it into our software FEAST. Our concept has the important advantage that applications built on top of FEAST benefit from the acceleration immediately, without changes to application code. In this paper […]
Dec, 16

Fast and Accurate Finite-Element Multigrid Solvers for PDE Simulations on GPU Clusters

The main contribution of this thesis is to demonstrate that graphics processors (GPUs) as representatives of emerging many-core architectures are very well-suited for the fast and accurate solution of large sparse linear systems of equations, using parallel multigrid methods on heterogeneous compute clusters. Such systems arise for instance in the discretisation of (elliptic) partial differential […]
Dec, 16

Exploring weak scalability for FEM calculations on a GPU-enhanced cluster

The first part of this paper surveys co-processor approaches for commodity based clusters in general, not only with respect to raw performance, but also in view of their system integration and power consumption. We then extend previous work on a small GPU cluster by exploring the heterogeneous hardware approach for a large-scale system with up […]
Dec, 15

Scientific Programming for Heterogeneous Systems – Bridging the Gap between Algorithms and Applications

High performance computing in heterogeneous environments is a dynamically developing area. A number of highly efficient heterogeneous parallel algorithms have been designed over last decade. At the same time, scientific software based on the algorithms is very much under par. The paper analyses main issues encountered by scientific programmers during implementation of heterogeneous parallel algorithms […]
Dec, 15

An effective GPU implementation of breadth-first search

Breadth-first search (BFS) has wide applications in electronic design automation (EDA) as well as in other fields. Researchers have tried to accelerate BFS on the GPU, but the two published works are both asymptotically slower than the fastest CPU implementation. In this paper, we present a new GPU implementation of BFS that uses a hierarchical […]
Dec, 15

High-throughput Bayesian network learning using heterogeneous multicore computers

Aberrant intracellular signaling plays an important role in many diseases. The causal structure of signal transduction networks can be modeled as Bayesian Networks (BNs), and computationally learned from experimental data. However, learning the structure of Bayesian Networks (BNs) is an NP-hard problem that, even with fast heuristics, is too time consuming for large, clinically important […]

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us: