Posts
Feb, 21
Scheduling a Parallel Sparse Direct Solver to Multiple GPUs
We present a sparse direct solver using multilevel task scheduling on a modern heterogeneous compute node consisting of a multi-core host processor and multiple GPU accelerators. Our direct solver is based on the multifrontal method, which is characterized by exploiting dense subproblems (fronts) related in an assembly tree. Critical to high performance of the solver […]
Feb, 21
Chrono: a parallel multi-physics library for rigid-body, flexible-body, and fluid dynamics
The last decade witnessed a manifest shift in the microprocessor industry towards chip designs that promote parallel computing. Until recently the privilege of a select group of large research centers, Teraflop computing is becoming a commodity owing to inexpensive GPU cards and multi to many-core x86 processors. This paradigm shift towards large scale parallel computing […]
Feb, 21
Computation of Air-Vortices Based on GPU Technology: Optimizing and Parallelizing a Model for Wake-Vortex Prediction Using OpenCL
This thesis details the refinement and numerical solution of a preexisting model for predicting the strengths and positions of so-called wake-vortices that are generated from the lift of heavy aircraft. The ultimate objective is to implement a numerical scheme for the model that is fast enough to allow for probabilistic methods, such as Monte Carlosimulations, […]
Feb, 21
GamePipe: A Virtualized Cloud Platform Design and Performance Evaluation
Cloud gaming provides game-on-demand (GoD) services over the Internet cloud. The goal is to achieve faster response time and higher QoS. The video game is rendered remotely on the game cloud and decoded on thin client devices such as tablet computer or smartphone. We design a game cloud with a virtualized cluster of CPU/GPU servers […]
Feb, 21
Ray Tracing on GPUs
The ray tracing method aims for producing realistic and high-quality images of a scene described by geometric primitives such as triangles, spheres, etc. The basic idea is quiet simple and allows for straight forward implementations of this technique on the computer. At its core is a set of rays, each of which corresponding to one […]
Feb, 20
Complexity Analysis and Algorithm Design for Reorganizing Data to Minimize Non-Coalesced Memory Accesses on GPU
The performance of Graphic Processing Units (GPU) is sensitive to irregular memory references. Some recent work shows the promise of data reorganization for eliminating non-coalesced memory accesses that are caused by irregular references. However, all previous studies have employed simple, heuristic methods to determine the new data layouts to create. As a result, they either […]
Feb, 20
An abstract object oriented runtime system for heterogeneous parallel architecture
In our paper we present an abstract object oriented runtime system that helps to develop scientific application for new hererogenous architecture based on multi-node of multi-core processors enhanced with accelerator boards. Its architecture based on abstract concepts enables to follow hardware technology by extending them with new implementations modeling new hardware components, while limiting the […]
Feb, 20
Streaming Data from HDD to GPUs for Sustained Peak Performance
In the context of the genome-wide association studies (GWAS), one has to solve long sequences of generalized least-squares problems; such a task has two limiting factors: execution time often in the range of days or weeks and data management data sets in the order of Terabytes. We present an algorithm that obviates both issues. By […]
Feb, 20
ClusCo: clustering and comparison of protein models
BACKGROUND: The development, optimization and validation of protein modeling methods require efficient tools for structural comparison. Frequently, a large number of models need to be compared with the target native structure. The main reason for the development of Clusco software was to create a high-throughput tool for all-versus-all comparison, because calculating similarity matrix is the […]
Feb, 20
Implementation and performance evaluation of a GPU particle-in-cell code
In this thesis, I designed and implemented a particle-in-cell (PIC) code on a graphical processing unit (GPU) using NVIDA’s Compute Unified Architecture (CUDA). The massively parallel nature of computing on a GPU nessecitated the development of new methods for various steps of the PIC method. I investigated different algorithms and data structures used in the […]
Feb, 18
Accelerating encryption using commodity hardware
Dedicated hardware encryption offers both low latency and high throughput at the expense of higher cost. A system that would encompass several architectures (SISD/SIMD) with a high number of memory hierarchies might be able to perform close to a dedicated encryption unit at the fraction of the cost. This report establishes the possibility of building […]
Feb, 18
Using High Performance Computing to Improve Image Guided Cancer Treatment
Radiotherapy is one of the main cancer treatments used today. It is a complex process that relies on finding the cancer in the images of the patients with the most accuracy possible in order to minimize the radiation that the surrounding organs receive. Given that a typical radiotherapy treatment process lasts for 6 weeks, ideally, […]