high performance computing on graphics processing units: hgpu.org

Posts

Dec, 16

A massively multicore parallelization of the Kohn-Sham energy gradients

In a previous article [Brown et al., J Chem Theory Comput 2009, 4, 1620], we described a quadrature-based formulation of the Kohn-Sham Coulomb problem that allows for efficient parallelization over thousands of small processor cores. Here, we present the analytic gradients of this modified Kohn-Sham scheme, and describe the parallel implementation of the gradients on […]

Dec, 16

Real-time optical micro-manipulation using optimized holograms generated on the GPU

Holographic optical tweezers allow the three-dimensional, dynamic, multipoint manipulation of micron sized objects using laser light. Exploiting the massive parallel architecture of modern GPUs we can generate highly optimized holograms at video frame-rate allowing the precise interactive micro-manipulation of complex structures.

CUDA

Dec, 16

Revolutionary technologies for acceleration of emerging petascale applications

As we enter the era of billion transistor chips, computer architects face significant challenges in effectively harnessing the large amount of computational potential available in modern CMOS technology. Chip designers have been moving away from maximizing single-thread performance via exponential scaling of clock frequencies toward chip multiprocessors (CMPs) in order to better manage trade-offs among […]

Dec, 16

GPU Acceleration of an Unmodified Parallel Finite Element Navier-Stokes Solver

We have previously suggested a minimally invasive approach to include hardware accelerators into an existing large-scale parallel finite element PDE solver toolkit, and implemented it into our software FEAST. Our concept has the important advantage that applications built on top of FEAST benefit from the acceleration immediately, without changes to application code. In this paper […]

CUDA

Dec, 16

Fast and Accurate Finite-Element Multigrid Solvers for PDE Simulations on GPU Clusters

The main contribution of this thesis is to demonstrate that graphics processors (GPUs) as representatives of emerging many-core architectures are very well-suited for the fast and accurate solution of large sparse linear systems of equations, using parallel multigrid methods on heterogeneous compute clusters. Such systems arise for instance in the discretisation of (elliptic) partial differential […]

CUDA

•

OpenGL

Dec, 16

Exploring weak scalability for FEM calculations on a GPU-enhanced cluster

The first part of this paper surveys co-processor approaches for commodity based clusters in general, not only with respect to raw performance, but also in view of their system integration and power consumption. We then extend previous work on a small GPU cluster by exploring the heterogeneous hardware approach for a large-scale system with up […]

Dec, 15

Scientific Programming for Heterogeneous Systems – Bridging the Gap between Algorithms and Applications

High performance computing in heterogeneous environments is a dynamically developing area. A number of highly efficient heterogeneous parallel algorithms have been designed over last decade. At the same time, scientific software based on the algorithms is very much under par. The paper analyses main issues encountered by scientific programmers during implementation of heterogeneous parallel algorithms […]

Dec, 15

An effective GPU implementation of breadth-first search

Breadth-first search (BFS) has wide applications in electronic design automation (EDA) as well as in other fields. Researchers have tried to accelerate BFS on the GPU, but the two published works are both asymptotically slower than the fastest CPU implementation. In this paper, we present a new GPU implementation of BFS that uses a hierarchical […]

Dec, 15

High-throughput Bayesian network learning using heterogeneous multicore computers

Aberrant intracellular signaling plays an important role in many diseases. The causal structure of signal transduction networks can be modeled as Bayesian Networks (BNs), and computationally learned from experimental data. However, learning the structure of Bayesian Networks (BNs) is an NP-hard problem that, even with fast heuristics, is too time consuming for large, clinically important […]

CUDA

Dec, 15

Integrative multicellular biological modeling: a case study of 3D epidermal development using GPU algorithms

BACKGROUND: Simulation of sophisticated biological models requires considerable computational power. These models typically integrate together numerous biological phenomena such as spatially-explicit heterogeneous cells, cell-cell interactions, cell-environment interactions and intracellular gene networks. The recent advent of programming for graphical processing units (GPU) opens up the possibility of developing more integrative, detailed and predictive biological models while […]

CUDA

Dec, 15

Visual simulation of thermal fluid dynamics in a pressurized water reactor

We present a simulation and visualization system for a critical application-analysis of the thermal fluid dynamics inside a pressurized water reactor of a nuclear power plant when cold water is injected into the reactor vessel. We employ a hybrid thermal lattice Boltzmann method (HTLBM), which has the advantages of ease of parallelization and ease of […]

OpenGL

Dec, 15

Scalable and Interactive Segmentation and Visualization of Neural Processes in EM Datasets

Recent advances in scanning technology provide high resolution EM (Electron Microscopy) datasets that allow neuroscientists to reconstruct complex neural connections in a nervous system. However, due to the enormous size and complexity of the resulting data, segmentation and visualization of neural processes in EM data is usually a difficult and very time-consuming task. In this […]

high performance computing on graphics processing units: hgpu.org

Posts

A massively multicore parallelization of the Kohn-Sham energy gradients

Real-time optical micro-manipulation using optimized holograms generated on the GPU

Revolutionary technologies for acceleration of emerging petascale applications

GPU Acceleration of an Unmodified Parallel Finite Element Navier-Stokes Solver

Fast and Accurate Finite-Element Multigrid Solvers for PDE Simulations on GPU Clusters

Exploring weak scalability for FEM calculations on a GPU-enhanced cluster

Scientific Programming for Heterogeneous Systems – Bridging the Gap between Algorithms and Applications

An effective GPU implementation of breadth-first search

High-throughput Bayesian network learning using heterogeneous multicore computers

Integrative multicellular biological modeling: a case study of 3D epidermal development using GPU algorithms

Visual simulation of thermal fluid dynamics in a pressurized water reactor

Scalable and Interactive Segmentation and Visualization of Neural Processes in EM Datasets

Recent source codes

CudaForge: An Agent Framework with Hardware Feedback for CUDA Kernel Optimization

LC Framework

pplx-garden: Perplexity open source garden for inference technology

Atlas CLI: Machine Learning (ML) Lifecycle & Transparency Manager

transformers_tvm: Implementation of Encoder Decoder transformer on TVM

OpScanner

INT v.s. FP: A framework to compare low-bit integer and float-point formats

AutoDock-GPU: AutoDock for GPUs and other accelerators

NCCLX: collective communication framework

Tutoring LLM into a Better CUDA Optimizer

Most viewed papers (last 30 days)