Posts
May, 29
CUDA Implementation of Parallel Algorithms for Animal Noseprint Identification
Concern about the threats posed by natural proliferation of animal-borne human diseases like BSE ("mad cow disease") and by the possible use of animals as disease vectors in bioterrorism, have spurred heightened interest in the development of methods for rapid automated identification of individual animals of various societally and commercially important mammalian species. Just as […]
May, 29
A fight for performance and accuracy of the matrix multiplication routines: CUBLAS on Nvidia Tesla versus MKL and ATLAS on Intel Nehalem
Scientific computation relies heavily on 64 bits arithmetic. The evolution of the Graphical Processing Units to the status of massively micro-parallel vector units and the improvement of their programmability make them stand as powerfull algebraic coprocessors for many classes of matrix calculus. But on these processors inheriting from architectures dedicated to video processing in the […]
May, 29
Using OpenCL to Calculate a Pressure Field
This report details the project in converting a CUDA program into an OpenCL program that would be adaptable to many platforms. Originally the CUDA program could only be ran on a NVIDA graphics card, which did not make the program very applicable for the user. Throughout this project the above authors learned how to program […]
May, 29
Massively Parallel Neural Encoding and Decoding of Visual Stimuli
The massively parallel nature of video Time Encoding Machines (TEMs) calls for scalable, massively parallel decoders that are implemented with neural components. The current generation of decoding algorithms is based on computing the pseudo-inverse of a matrix and does not satisfy these requirements. Here we consider video TEMs with an architecture built using Gabor receptive […]
May, 29
Time-dependent density-functional theory in massively parallel computer architectures: the OCTOPUS project
Octopus is a general-purpose density-functional theory (DFT) code, with a particular emphasis on the time-dependent version of DFT (TDDFT). In this paper we present the ongoing efforts to achieve the parallelization of octopus. We focus on the real-time variant of TDDFT, where the time-dependent Kohn-Sham equations are directly propagated in time. This approach has great […]
May, 29
Explicit Cache Management for Volume Ray-Casting on Parallel Architectures
A major challenge when designing general purpose graphics hardware is to allow efficient access to texture data. Although different rendering paradigms vary with respect to their data access patterns, there is no flexibility when it comes to data caching provided by the graphics architecture. In this paper we focus on volume ray-casting, and show the […]
May, 27
The Third International Workshop on Frontier of GPU Computing, FGC 2012
To be held in conjunction with HPCC 2012 The goal of this workshop is to provide a forum for researchers and practitioners to discuss and share their research and development experiences and outputs on the massively parallel GPU platforms, software development tools, optimization techniques, parallel algorithm design, and all kinds of successful applications. We solicit […]
May, 26
Parameterized Verification of GPU Kernel Programs
We present an automated symbolic verifier for checking the functional correctness of GPGPU kernels parametrically, for an arbitrary number of threads. Our tool PUGpara checks the functional equivalence of a kernel and its optimized versions, helping debug errors introduced during memory coalescing and bank conflict elimination related optimizations. Key features of our work include: (1) […]
May, 26
Parallel Parametric Optimisation with Firefly Algorithms on Graphical Processing Units
Parametric optimisation techniques such as Particle Swarm Optimisation (PSO), Firefly algorithms (FAs), genetic algorithms (GAs) are at the centre of attention in a range of optimisation problems where local minima plague the parameter space. Variants of these algorithms deal with the problems presented by local minima in a variety of ways. A salient feature in […]
May, 26
Routine Microsecond Molecular Dynamics Simulations with AMBER on GPUs. 1. Generalized Born
We present an implementation of generalized Born implicit solvent all-atom classical molecular dynamics (MD) within the AMBER program package that runs entirely on CUDA enabled NVIDIA graphics processing units (GPUs). We discuss the algorithms that are used to exploit the processing power of the GPUs and show the performance that can be achieved in comparison […]
May, 26
Fast and accurate digital signal processing realized with GPGPU technology
An idea of the so-called quasi-maximum accuracy computations for improvement of precision of the floating-point digital signal processing with graphic processing units (GPUs) is presented in this paper. In the presented approach, the increase of the precision of computations does not need any increase of the length of the data words. Special attention has been […]
May, 26
Parallelization of the Local Threshold and Boolean Function Based Edge Detection Algorithm Using CUDA
In this paper we present a parallelized algorithm for edge detection for gray scale images. The chosen method is the local threshold and boolean function based edge detection. This method differs from common edge detectors in the use of bit map patterns instead of analyzing gradient changes in the image for edge recognition. The parallelization […]