12515

Posts

Jul, 11

Development of a Restricted Additive Schwarz Preconditioner for Sparse Linear Systems on NVIDIA GPU

In this paper, we develop, study and implement a restricted additive Schwarz (RAS) preconditioner for speedup of the solution of sparse linear systems on NVIDIA Tesla GPU. A novel algorithm for constructing this preconditioner is proposed. This algorithm involves two phases. In the first phase, the construction of the RAS preconditioner is transformed to an […]
Jul, 11

Accelerating Preconditioned Iterative Linear Solvers on GPU

Linear systems are required to solve in many scientific applications and the solution of these systems often dominates the total running time. In this paper, we introduce our work on developing parallel linear solvers and preconditioners for solving large sparse linear systems using NVIDIA GPUs. We develop a new sparse matrix-vector multiplication kernel and a […]
Jul, 11

A Hybrid Parallel Implementation of the Aho-Corasick and Wu-Manber Algorithms Using NVIDIA CUDA and MPI Evaluated on a Biological Sequence Database

Multiple matching algorithms are used to locate the occurrences of patterns from a finite pattern set in a large input string. Aho-Corasick and Wu-Manber, two of the most well known algorithms for multiple matching require an increased computing power, particularly in cases where large-size datasets must be processed, as is common in computational biology applications. […]
Jul, 11

Parallelization of BFS Graph Algorithm using CUDA

Graphs play a very important role in the field of Science and Technology for finding the shortest distance between any two places. This Paper demonstrate the recent technology named as CUDA (Compute Unified Device Architecture) working for BFS Graph Algorithm. There are some Graph algorithms are fundamental to many disciplines and application areas. Large graphs […]
Jul, 11

Algorithms and Data Structures for Interactive Ray Tracing on Commodity Hardware

Rendering methods based on ray tracing provide high image realism, but have been historically regarded as offline only. This has changed in the past decade, due to significant advances in the construction and traversal performance of acceleration structures and the efficient use of data-parallel processing. Today, all major graphics companies offer real-time ray tracing solutions. […]
Jul, 11

Hybrid Particle Lattice Boltzmann Shallow Water for interactive fluid simulations

We introduce a hybrid approach for the simulation of fluids based in the Lattice Boltzmann Method for Shallow Waters and particle systems. Our modified LBM Shallow Waters can handle arbitrary underlying terrain and arbitrary fluid depth. It also introduces a novel and simplified method of tracking dry-wet regions. Dynamic rigid bodies are also included in […]
Jul, 11

Visualization and Correction of Automated Segmentation, Tracking and Lineaging from 5-D Stem Cell Image Sequences

RESULTS: We present an application that enables the quantitative analysis of multichannel 5-D (x, y, z, t, channel) and large montage confocal fluorescence microscopy images. The image sequences show stem cells together with blood vessels, enabling quantification of the dynamic behaviors of stem cells in relation to their vascular niche, with applications in developmental and […]
Jul, 11

Visualization of Large Volumetric Multi-Channel Microscopy Data Streams on Standard PCs

BACKGROUND: Visualization of multi-channel microscopy data plays a vital role in biological research. With the ever-increasing resolution of modern microscopes the data set size of the scanned specimen grows steadily. On commodity hardware this size easily exceeds the available main memory and the even more limited GPU memory. Common volume rendering techniques require the entire […]
Jul, 10

Improving Performance and Energy Consumption of Runtime Schedulers for Dense Linear Algebra

The road towards Exascale Computing requires a holistic effort to address three different challenges simultaneously: high performance, energy efficiency, and programmability. The use of runtime task schedulers to orchestrate parallel executions with minimal developer intervention has been introduced in recent years to tackle the programmability issue while maintaining, or even improving, performance. In this paper, […]
Jul, 10

COFFEE: an Optimizing Compiler for Finite Element Local Assembly

The numerical solution of partial differential equations using the finite element method is one of the key applications of high performance computing. Local assembly is its characteristic operation. This entails the execution of a problem-specific kernel to numerically evaluate an integral for each element in the discretized problem domain. Since the domain size can be […]
Jul, 10

Random Fields Generation on the GPU with the Spectral Turning Bands Method

Random Field (RF) generation algorithms are of paramount importance for many scientific domains, such as astrophysics, geostatistics, computer graphics and many others. Some examples are the generation of initial conditions for cosmological simulations or hydrodynamical turbulence driving. In the latter a new random field is needed every time-step. Current approaches commonly make use of 3D […]
Jul, 10

Understanding the SIMD Efficiency of Graph Traversal on GPU

Graph is a widely used data structure and graph algorithms, such as breadth-first search (BFS), are regarded as key components in a great number of applications. Recent studies have attempted to accelerate graph algorithms on highly parallel graphics processing unit (GPU). Although many graph algorithms based on large graphs exhibit abundant parallelism, their performance on […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: