Posts
Nov, 25
Characterization and Performance Analysis for 3D Benchmarks
The change in processor architectures and 3D benchmarks makes performance characterization important for every processor and 3D application generation. Recent 3D applications require large amount of data to be processed by the GPU and the CPU. This leads to the importance in analyzing processor performance for different architectures and benchmarks so that benchmarks and processors […]
Nov, 25
Location-based Matching in Publish/Subscribe Revisited
Event processing is gaining rising interest in industry and in academia. The common application pattern is that event processing agents publish events while other agents subscribe to events of interest. Extensive research has been devoted to developing efficient and scalable algorithms to match events with subscribers’ interests. The predominant abstraction used in this context is […]
Nov, 24
Multidimensional Costas Arrays and Their Enumeration Using GPUs and FPGAs
The enumeration of two-dimensional Costas arrays is a problem with factorial time complexity and has been solved for sizes up to 29 using computer clusters. Costas arrays of higher dimensionality have recently been proposed and their properties are beginning to be understood. This paper presents, to the best of our knowledge, the first proposed implementations […]
Nov, 24
The Uintah Framework: A Unified Heterogeneous Task Scheduling and Runtime System
The development of a new unified, multi-threaded runtime system for the execution of asynchronous tasks on heterogeneous systems is described in this work. These asynchronous tasks arise from the Uintah framework, which was developed to provide an environment for solving a broad class of fluid-structure interaction problems on structured adaptive grids. Uintah has a clear […]
Nov, 24
Improving the Performance of the Linear Systems Solvers Using CUDA
Parallel computing can offer an enormous advantage regarding the performance for very large applications in almost any field: scientific computing, computer vision, databases, data mining, and economics. GPUs are high performance many-core processors that can obtain very high FLOP rates. Since the first idea of using GPU for general purpose computing, things have evolved and […]
Nov, 24
Enhancing and Porting the HPC-Lab Snow Simulator to OpenCL on Mobile Platforms
Porting a computationally demanding CUDA application to a GPU designed for mobile phones and tablets, which supports OpenCL, is the subject of this thesis. Significant effort is made to prepare the snow simulator of the HPC-LAB at IDI, NTNU, for porting to an OpenCL capable GPU for mobile phones, with a reasonably limited effort, when […]
Nov, 24
GPU Isosurface Raycasting of FCC Datasets
This paper presents an efficient and accurate isosurface rendering algorithm for the natural C^1 splines on the face-centered cubic (FCC) lattice. Leveraging fast and accurate evaluation of a spline field and its gradient, accompanied by efficient empty-space skipping, the approach generates high-quality isosurfaces of FCC datasets at interactive speed (20-70 fps). The pre-processing computation (quasi-interpolation […]
Nov, 23
Auto-tuning on the macro scale: high level algorithmic auto-tuning for scientific applications
In this thesis, we describe a new classification of auto-tuning methodologies spanning from low-level optimizations to high-level algorithmic tuning. This classification spectrum of auto-tuning methods encompasses the space of tuning parameters from low-level optimizations (such as block sizes, iteration ordering, vectorization, etc.) to high-level algorithmic choices (such as whether to use an iterative solver or […]
Nov, 23
Evaluation of Two Parallel Finite Element Implementations of the Time-Dependent Advection Diffusion Problem: GPU versus Cluster Considering Time and Energy Consumption
We analyze two parallel finite element implementations of the 2D time-dependent advection diffusion problem, one for multi-core clusters and one for CUDA-enabled GPUs, and compare their performances in terms of time and energy consumption. The parallel CUDA-enabled GPU implementation was derived from the multi-core cluster version. Our experimental results show that a desktop machine with […]
Nov, 23
GPU Acceleration of Transmural Electrophysiological Imaging
Tranmural electrophysiological imaging (TEPI) is becoming a possibility with the aid of 3D in silico cardiac EP models and the statistical estimation theory. By quasi Monte-Carlo (MC) simulation of the 3D EP models on the subject-specific anatomical model, complex and physiologically meaningful spatiotemporal priors are produced to achieve the 2D-to-3D transition of EP data, an […]
Nov, 23
Scalable Multi-GPU 3-D FFT for TSUBAME 2.0 Supercomputer
For scalable 3-D FFT computation using multiple GPUs, efficient all-to-all communication between GPUs is the most important factor in good performance. Implementations with point-to-point MPI library functions and CUDA memory copy APIs typically exhibit very large overheads especially for small message sizes in all-to-all communications between many nodes. We propose several schemes to minimize the […]
Nov, 23
Efficient reconstruction of biological networks via transitive reduction on general purpose graphics processors
BACKGROUND: Techniques for reconstruction of biological networks which are based on perturbation experimentsoften predict direct interactions between nodes that do not exist. Transitive reduction removes suchrelations if they can be explained by an indirect path of in influences. The existing algorithms fortransitive reduction are sequential and might suffer from too long run times for large […]