Posts
May, 30
Accelerating an imaging spectroscopy algorithm for submerged marine environments using heterogeneous computing
Graphics Processing Units (GPUs) have proven to be highly effective at accelerating processing speed for a large range of scientific and general purpose applications. As data needs increase, and more complex data analysis methods are used, the processing requirements for solving scientific problems also correspondingly increase. The massive parallel processing power of GPUs can be […]
May, 30
X-Device Query Processing by Bitwise Distribution
The diversity of hardware components within a single system calls for strategies for efficient cross-device data processing. For example, existing approaches to CPU/GPU co-processing distribute individual relational operators to the "most appropriate" device. While pleasantly simple, this strategy has a number of problems: it may leave the "inappropriate" devices idle while overloading the "appropriate" device […]
May, 30
GPU-accelerated simulation of colloidal suspensions with direct hydrodynamic interactions
Solvent-mediated hydrodynamic interactions between colloidal particles can significantly alter their dynamics. We discuss the implementation of Stokesian dynamics in leading approximation for streaming processors as provided by the compute unified device architecture (CUDA) of recent graphics processors (GPUs). Thereby, the simulation of explicit solvent particles is avoided and hydrodynamic interactions can easily be accounted for […]
May, 29
Performance-Analysis-Based Acceleration of Image Quality Assessment
Two stages are commonly employed in modern algorithms of image/video quality assessment (QA): (1) a local frequency-based decomposition, and (2) block-based statistical comparisons between the frequency coefficients of the reference and distorted images. This paper presents a performance analysis of and techniques for accelerating these stages. We specifically analyze and accelerate one representative QA algorithm […]
May, 29
COVRA: A compression-domain output-sensitive volume rendering architecture based on a sparse representation of voxel blocks
We present a novel multiresolution compression-domain GPU volume rendering architecture designed for interactive local and networked exploration of rectilinear scalar volumes on commodity platforms. In our approach, the volume is decomposed into a multiresolution hierarchy of bricks. Each brick is further subdivided into smaller blocks, which are compactly described by sparse linear combinations of prototype […]
May, 29
A GPU-Based Track-Repeating Algorithm for Dose Calculation for Photon Radiotherapy
An essential ingredient in radiotherapy is the calculation of the dose to be delivered to the patient. Analytical algorithms are commonly used for such a task, however their accuracy is not always satisfactory. Monte Carlo techniques provide higher accuracy, but they often require large computational times. Track-repeating algorithms, for example the Fast Dose Calculator, have […]
May, 29
Hybrid Update Algorithms for Regular Lattice and Small-World Ising Models on Graphical Processing Units
Local and cluster Monte Carlo update algorithms offer a complex tradeoff space for optimising the performance of simulations of the Ising model. We systematically explore tradeoffs between hybrid Metropolis and Wolff cluster updates for the 3D Ising model using data-parallelism and graphical processing units. We investigate performance for both regular lattices as well as for […]
May, 29
CUDA Implementation of Parallel Algorithms for Animal Noseprint Identification
Concern about the threats posed by natural proliferation of animal-borne human diseases like BSE ("mad cow disease") and by the possible use of animals as disease vectors in bioterrorism, have spurred heightened interest in the development of methods for rapid automated identification of individual animals of various societally and commercially important mammalian species. Just as […]
May, 29
A fight for performance and accuracy of the matrix multiplication routines: CUBLAS on Nvidia Tesla versus MKL and ATLAS on Intel Nehalem
Scientific computation relies heavily on 64 bits arithmetic. The evolution of the Graphical Processing Units to the status of massively micro-parallel vector units and the improvement of their programmability make them stand as powerfull algebraic coprocessors for many classes of matrix calculus. But on these processors inheriting from architectures dedicated to video processing in the […]
May, 29
Using OpenCL to Calculate a Pressure Field
This report details the project in converting a CUDA program into an OpenCL program that would be adaptable to many platforms. Originally the CUDA program could only be ran on a NVIDA graphics card, which did not make the program very applicable for the user. Throughout this project the above authors learned how to program […]
May, 29
Massively Parallel Neural Encoding and Decoding of Visual Stimuli
The massively parallel nature of video Time Encoding Machines (TEMs) calls for scalable, massively parallel decoders that are implemented with neural components. The current generation of decoding algorithms is based on computing the pseudo-inverse of a matrix and does not satisfy these requirements. Here we consider video TEMs with an architecture built using Gabor receptive […]
May, 29
Time-dependent density-functional theory in massively parallel computer architectures: the OCTOPUS project
Octopus is a general-purpose density-functional theory (DFT) code, with a particular emphasis on the time-dependent version of DFT (TDDFT). In this paper we present the ongoing efforts to achieve the parallelization of octopus. We focus on the real-time variant of TDDFT, where the time-dependent Kohn-Sham equations are directly propagated in time. This approach has great […]