Posts
Jun, 1
Efficient Implementation of Hyperspectral Anomaly Detection Techniques on GPUs and Multicore Processors
Anomaly detection is an important task for hyperspectral data exploitation. Although many algorithms have been developed for this purpose in recent years, due to the large dimensionality of hyperspectral image data, fast anomaly detection remains a challenging task. In this work, we exploit the computational power of commodity graphics processing units (GPUs) and multicore processors […]
Jun, 1
A Performance Model for the Communication in Fast Multipole Methods on HPC Platforms
Exascale systems are predicted to have approximately one billion cores, assuming Gigahertz cores. Limitations on affordable network topologies for distributed memory systems of such massive scale bring new challenges to the current parallel programing model. Currently, there are many efforts to evaluate the hardware and software bottlenecks of exascale designs. There is therefore an urgent […]
Jun, 1
An implementation of a reordering approach for increasing the product of diagonal entries in a sparse matrix
We present implementation details of a reordering strategy for permuting elements whose absolute value is large to the diagonal of a sparse matrix. This algorithm, based on work by Duff and Koster [9], is a critical component of the SPIKE-based preconditioner provided by the Spike::GPU library [2]. We discuss the four stages required to implement […]
Jun, 1
Evaluating GPU Passthrough in Xen for High Performance Cloud Computing
With the advent of virtualization and Infrastructure-as-a-Service (IaaS), the broader scientific computing community is considering the use of clouds for their technical computing needs. This is due to the relative scalability, ease of use, advanced user environment customization abilities clouds provide, as well as many novel computing paradigms available for data-intensive applications. However, there is […]
Jun, 1
A CUDA-enabled Parallel Implementation of Collaborative Filtering
Collaborative filtering (CF) is one of the essential algorithms in recommendation system. Based on the performance analysis, two computational kernels are identified. In order to accelerate CF on large-scale data, a CUDA-enabled parallel CF approach is proposed where an efficient data partition scheme is proposed as well. Various optimization techniques are also applied to maximize […]
May, 31
GPU Ray Tracing with CUDA
Ray tracing is a technique for rendering images in computer graphics by simulating how light rays interact with the virtual environment. By tracing the path of a light ray through a scene and emulating the effect of the ray as it intersects with virtual objects, the ray tracing algorithm can accurately portray reflections, refractions, shadows, […]
May, 31
Parallel SAT solvers and their application in automatic parallelization
Since the slowdown in improvement in the frequency of processors, a new tendency has arisen to allow software to take advantage of faster hardware: parallelization. However, different from increasing the frequency of processors, using parallelization requires a different kind of programming, parallel programming, which is usually harder than common sequential programming. In this context, automatic […]
May, 31
Fast parallel volume visualization on CUDA technology
In the medical diagnosis and treatment planning, radiologists and surgeons rely heavily on the slices produced by medical imaging scanners. Unfortunately, most of these scanners can only produce two dimensional images because the machines that can produce three dimensional are very expensive. The two dimensional images from these devices are difficult to interpret because they […]
May, 31
Cytochrome P450 site of metabolism prediction from 2D topological fingerprints using GPU accelerated probabilistic classifiers
BACKGROUND: The prediction of sites and products of metabolism in xenobiotic compounds is key to the development of new chemical entities, where screening potential metabolites for toxicity or unwanted side-effects is of crucial importance. In this work 2D topological fingerprints are used to encode atomic sites and three probabilistic machine learning methods are applied: Parzen-Rosenblatt […]
May, 31
Bulk Execution of Oblivious Algorithms on the Unified Memory Machine, with GPU Implementation
The Unified Memory Machine (UMM) is a theoretical parallel computing model that captures the essence of the global memory access of GPUs. A sequential algorithm is oblivious if an address accessed at each time does not depend on input data. Many important tasks including matrix computation, signal processing, sorting, dynamic programming, and encryption/decryption can be […]
May, 30
CPU, GPU and FPGA Implementations of MALD: Ceramic Tile Surface Defects Detection Algorithm
This paper addresses adjustments, implementation and performance comparison of the Moving Average with Local Difference (MALD) method for ceramic tile surface defects detection. Ceramic tile production process is completely autonomous, except the final stage where human eye is required for defects detection. Recent computational platform development and advances in machine vision provides us with several […]
May, 30
Unified Particle Physics for Real-Time Applications
We present a unified dynamics framework for real-time visual effects. Using particles connected by constraints as our fundamental building block allows us to treat contact and collisions in a unified manner, and we show how this representation is flexible enough to model gases, liquids, deformable solids, rigid bodies and clothing with two-way interactions. We address […]