Posts
Jun, 1
An open source MATLAB program for fast numerical Feynman integral calculations for open quantum system dynamics on GPUs
This MATLAB program calculates the dynamics of the reduced density matrix of an open quantum system modeled by the Feynman-Vernon model. The user gives the program a vector describing the coordinate of an open quantum system, a hamiltonian matrix describing its energy, and a spectral distribution function and temperature describing the environment’s influence on it, […]
May, 30
clSpMV: A Cross-Platform OpenCL SpMV Framework on GPUs
Sparse matrix vector multiplication (SpMV) kernel is a key computation in linear algebra. Most iterative methods are composed of SpMV operations with BLAS1 updates. Therefore, researchers make extensive efforts to optimize the SpMV kernel in sparse linear algebra. With the appearance of OpenCL, a programming language that standardizes parallel programming across a wide variety of […]
May, 30
Large-scale Nanostructure Simulations from X-ray Scattering Data On Graphics Processor Clusters
X-ray scattering is a valuable tool for measuring the structural properties of materials used in the design and fabrication of energy-relevant nanodevices (e.g., photovoltaic, energy storage, battery, fuel, and carbon capture and sequestration devices) that are key to the reduction of carbon emissions. Although today’s ultra-fast X-ray scattering detectors can provide tremendous information on the […]
May, 30
Accelerating an imaging spectroscopy algorithm for submerged marine environments using heterogeneous computing
Graphics Processing Units (GPUs) have proven to be highly effective at accelerating processing speed for a large range of scientific and general purpose applications. As data needs increase, and more complex data analysis methods are used, the processing requirements for solving scientific problems also correspondingly increase. The massive parallel processing power of GPUs can be […]
May, 30
X-Device Query Processing by Bitwise Distribution
The diversity of hardware components within a single system calls for strategies for efficient cross-device data processing. For example, existing approaches to CPU/GPU co-processing distribute individual relational operators to the "most appropriate" device. While pleasantly simple, this strategy has a number of problems: it may leave the "inappropriate" devices idle while overloading the "appropriate" device […]
May, 30
GPU-accelerated simulation of colloidal suspensions with direct hydrodynamic interactions
Solvent-mediated hydrodynamic interactions between colloidal particles can significantly alter their dynamics. We discuss the implementation of Stokesian dynamics in leading approximation for streaming processors as provided by the compute unified device architecture (CUDA) of recent graphics processors (GPUs). Thereby, the simulation of explicit solvent particles is avoided and hydrodynamic interactions can easily be accounted for […]
May, 29
Performance-Analysis-Based Acceleration of Image Quality Assessment
Two stages are commonly employed in modern algorithms of image/video quality assessment (QA): (1) a local frequency-based decomposition, and (2) block-based statistical comparisons between the frequency coefficients of the reference and distorted images. This paper presents a performance analysis of and techniques for accelerating these stages. We specifically analyze and accelerate one representative QA algorithm […]
May, 29
COVRA: A compression-domain output-sensitive volume rendering architecture based on a sparse representation of voxel blocks
We present a novel multiresolution compression-domain GPU volume rendering architecture designed for interactive local and networked exploration of rectilinear scalar volumes on commodity platforms. In our approach, the volume is decomposed into a multiresolution hierarchy of bricks. Each brick is further subdivided into smaller blocks, which are compactly described by sparse linear combinations of prototype […]
May, 29
A GPU-Based Track-Repeating Algorithm for Dose Calculation for Photon Radiotherapy
An essential ingredient in radiotherapy is the calculation of the dose to be delivered to the patient. Analytical algorithms are commonly used for such a task, however their accuracy is not always satisfactory. Monte Carlo techniques provide higher accuracy, but they often require large computational times. Track-repeating algorithms, for example the Fast Dose Calculator, have […]
May, 29
Hybrid Update Algorithms for Regular Lattice and Small-World Ising Models on Graphical Processing Units
Local and cluster Monte Carlo update algorithms offer a complex tradeoff space for optimising the performance of simulations of the Ising model. We systematically explore tradeoffs between hybrid Metropolis and Wolff cluster updates for the 3D Ising model using data-parallelism and graphical processing units. We investigate performance for both regular lattices as well as for […]
May, 29
CUDA Implementation of Parallel Algorithms for Animal Noseprint Identification
Concern about the threats posed by natural proliferation of animal-borne human diseases like BSE ("mad cow disease") and by the possible use of animals as disease vectors in bioterrorism, have spurred heightened interest in the development of methods for rapid automated identification of individual animals of various societally and commercially important mammalian species. Just as […]
May, 29
A fight for performance and accuracy of the matrix multiplication routines: CUBLAS on Nvidia Tesla versus MKL and ATLAS on Intel Nehalem
Scientific computation relies heavily on 64 bits arithmetic. The evolution of the Graphical Processing Units to the status of massively micro-parallel vector units and the improvement of their programmability make them stand as powerfull algebraic coprocessors for many classes of matrix calculus. But on these processors inheriting from architectures dedicated to video processing in the […]