1543

Posts

Nov, 11

Exact Sparse Matrix-Vector Multiplication on GPU’s and Multicore Architectures

We propose different implementations of the sparse matrix–dense vector multiplication (spmv{}) for finite fields and rings $Zb/mZb$. We take advantage of graphic card processors (GPU) and multi-core architectures. Our aim is to improve the speed of spmv{} in the linbox library, and henceforth the speed of its black box algorithms. Besides, we use this and […]
Nov, 11

Ultra-fast treatment plan optimization for volumetric modulated arc therapy (VMAT)

Purpose: To develop a novel aperture-based algorithm for volumetric modulated arc therapy (VMAT) treatment plan optimization with high quality and high efficiency. Methods: The VMAT optimization problem is formulated as a large-scale convex programming problem solved by a column generation approach. We consider a cost function consisting two terms, the first which enforces a desired […]
Nov, 11

MYRIAD: A new N-body code for simulations of Star Clusters

We present a new C++ code for collisional N-body simulations of star clusters. The code uses the Hermite fourth-order scheme with block time steps, for advancing the particles in time, while the forces and neighboring particles are computed using the GRAPE-6 board. Special treatment is used for close encounters, binary and multiple sub-systems that either […]
Nov, 11

A Flexible Patch-Based Lattice Boltzmann Parallelization Approach for Heterogeneous GPU-CPU Clusters

Sustaining a large fraction of single GPU performance in parallel computations is considered to be the major problem of GPU-based clusters. In this article, this topic is addressed in the context of a lattice Boltzmann flow solver that is integrated in the WaLBerla software framework. We propose a multi-GPU implementation using a block-structured MPI parallelization, […]
Nov, 11

Toward large-scale Hybrid Monte Carlo simulations of the Hubbard model on graphics processing units

The performance of the Hybrid Monte Carlo algorithm is determined by the speed of sparse matrix-vector multiplication within the context of preconditioned conjugate gradient iteration. We study these operations as implemented for the fermion matrix of the Hubbard model in d+1 space-time dimensions, and report a performance comparison between a 2.66 GHz Intel Xeon E5430 […]
Nov, 10

QYMSYM: A GPU-Accelerated Hybrid Symplectic Integrator That Permits Close Encounters

We describe a parallel hybrid symplectic integrator for planetary system integration that runs on a graphics processing unit (GPU). The integrator identifies close approaches between particles and switches from symplectic to Hermite algorithms for particles that require higher resolution integrations. The integrator is approximately as accurate as other hybrid symplectic integrators but is GPU accelerated.
Nov, 10

Interactive Visualization of the Largest Radioastronomy Cubes

3D visualization is an important data analysis and knowledge discovery tool, however, interactive visualization of large 3D astronomical datasets poses a challenge for many existing data visualization packages. We present a solution to interactively visualize larger-than-memory 3D astronomical data cubes by utilizing a heterogeneous cluster of CPUs and GPUs. The system partitions the data volume […]
Nov, 10

Fully automatic extraction of salient objects from videos in near real-time

Automatic video segmentation plays an important role in a wide range of computer vision and image processing applications. Recently, various methods have been proposed for this purpose. The problem is that most of these methods are far from real-time processing even for low-resolution videos due to the complex procedures. To this end, we propose a […]
Nov, 10

A GPU-based hyperbolic SVD algorithm

The one-sided Jacobi hyperbolic singular value decomposition (HSVD) algorithm, targeting the massively parallel graphics processing units (GPUs), is developed. The algorithm also serves as the final stage of solving the symmetric indefinite eigenvalue problem. Numerical testing demonstrates the gains in speed and accuracy over the sequential and MPI-parallelized variants of the same Jacobi-type HSVD algorithms. […]
Nov, 10

A GPU implementation of a track-repeating algorithm for proton radiotherapy dose calculations

An essential component in proton radiotherapy is the algorithm to calculate the radiation dose to be delivered to the patient. The most common dose algorithms are fast but they are approximate analytical approaches. However their level of accuracy is not always satisfactory, especially for heterogeneous anatomic areas, like the thorax. Monte Carlo techniques provide superior […]
Nov, 10

GPU-based Iterative Cone Beam CT Reconstruction Using Tight Frame Regularization

X-ray imaging dose from serial cone-beam CT (CBCT) scans raises a clinical concern in most image guided radiation therapy procedures. It is the goal of this paper to develop a fast GPU-based algorithm to reconstruct high quality CBCT images from undersampled and noisy projection data so as to lower the imaging dose. For this purpose, […]
Nov, 10

Faster Radix Sort via Virtual Memory and Write-Combining

Sorting algorithms are the deciding factor for the performance of common operations such as removal of duplicates or database sort-merge joins. This work focuses on 32-bit integer keys, optionally paired with a 32-bit value. We present a fast radix sorting algorithm that builds upon a microarchitecture-aware variant of counting sort. Taking advantage of virtual memory […]

Recent source codes

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us:

contact@hpgu.org