Posts
Feb, 1
Buffer k-d Trees: Processing Massive Nearest Neighbor Queries on GPUs
We present a new approach for combining k-d trees and graphics processing units for nearest neighbor search. It is well known that a direct combination of these tools leads to a non-satisfying performance due to conditional computations and suboptimal memory accesses. To alleviate these problems, we propose a variant of the classical k-d tree data […]
Feb, 1
Speeding Up Object Detection: Fast Resizing in the Integral Image Domain
In this paper, we present an approach to resize integral images directly in the integral image domain. For the special case of resizing by a power of two, we propose a highly parallelizable variant of our approach, which is identical to bilinear resizing in the image domain in terms of results, but requires fewer operations […]
Feb, 1
High Performance Computing of Dynamic Structural Response Analysis for the Integrated Earthquake Simulation
This paper proposes an application of high performance computing (HPC) to dynamic structural response analysis (DSRA) in order to enhance the capability and increase the efficiency of integrated earthquake simulation (IES). Object Based Structural Analysis (OBASAN) is a candidate DSRA program for IES. With OBASAN, the reliability of structural damage prediction can be increased by […]
Feb, 1
Survey on Efficient Linear Solvers for Porous Media Flow Models on Recent Hardware Architectures
In the pastfew years, High Performance Computing (HPC) technologies led to General Purpose Processing on Graphics Processing Units (GPGPU) and many-core architectures. These emerging technologies offer massive processing units and are interesting for porous media flow simulators may used for CO2 geological sequestration or Enhanced Oil Recovery (EOR) simulation. However the crucial point is "are […]
Jan, 30
Taking advantage of hybrid systems for sparse direct solvers via task-based runtimes
The ongoing hardware evolution exhibits an escalation in the number, as well as in the heterogeneity, of the computing resources. The pressure to maintain reasonable levels of performance and portability, forces the application developers to leave the traditional programming paradigms and explore alternative solutions. PaStiX is a parallel sparse direct solver, based on a dynamic […]
Jan, 30
Towards Efficient Risk Quantification-Using GPUs and Variance Reduction Technique
Value-at-Risk (VaR) provides information about global risk in trading. The request for high speed calculation about VaR is rising because financial institutions need to measure the risk in real time. Researchers in HPC also recently turned their attention on this kind of demanding applications. In this master thesis, we introduce two complementary and different strategies […]
Jan, 30
A Novel Graphical Processing Unit Method for Power Systems Security Analysis
There is an increasing need for computational power to drive software tools used in power systems planning and operations, since the emergence of modern energy markets and recent renewable generation technology fundamentally alters how energy flows through the existing power grid. While special-purpose hardware, including supercomputers, has been explored for this purpose, inexpensive commodity hardware […]
Jan, 30
Comparing the Performance of Different x86 SIMD Instruction Sets for a Medical Imaging Application on Modern Multi- and Manycore Chips
Single Instruction, Multiple Data (SIMD) vectorization is a major driver of performance in current architectures, and is mandatory for achieving good performance with codes that are limited by instruction throughput. We investigate the efficiency of different SIMD-vectorized implementations of the RabbitCT benchmark. RabbitCT performs 3D image reconstruction by back projection, a vital operation in computed […]
Jan, 30
GPU-Accelerated BWT Construction for Large Collection of Short Reads
Advances in DNA sequencing technology have stimulated the development of algorithms and tools for processing very large collections of short strings (reads). Short-read alignment and assembly are among the most well-studied problems. Many state-of-the-art aligners, at their core, have used the Burrows-Wheeler transform (BWT) as a main-memory index of a reference genome (typical example, NCBI […]
Jan, 30
A GPU accelerated algorithm for 3D Delaunay triangulation
We propose the first algorithm to compute the 3D Delaunay triangulation (DT) on the GPU. Our algorithm uses massively parallel point insertion followed by bilateral flipping, a powerful local operation in computational geometry. Although a flipping algorithm is very amenable to parallel processing and has been employed to construct the 2D DT and the 3D […]
Jan, 30
A CUDA Monte Carlo simulator for radiation therapy dosimetry based on Geant4
Geant4 is a large-scale particle physics package that facilitates every aspect of particle transport simulation. This includes, but is not limited to, geometry description, material definition, tracking of particles passing through and interacting with matter, storage of event data, and visualization. As more detailed and complex simulations are required in different application domains, there is […]
Jan, 30
A QUDA-branch to compute disconnected diagrams in GPUs
Although QUDA allows for an efficient computation of many QCD quantities, it is surprinsingly lacking tools to evaluate disconnected diagrams, for which GPUs are specially well suited. We aim to fill this gap by creating our own branch of QUDA, which includes new kernels and functions required to calculate fermion loops using several methods and […]