Posts
Dec, 9
Load Balancing Utilizing Data Redundancy in Distributed Volume Rendering
In interactive volume rendering, the cost for rendering a certain block of the volume strongly varies with dynamically changing parameters (most notably the camera position and orientation). In distributed environments – wherein each compute device renders one block – this potentially causes severe load-imbalance. Balancing the load usually induces costly data transfers causing critical rendering […]
Dec, 9
A design tool for efficient mapping of multimedia applications onto heterogeneous platforms
Development of multimedia systems on heterogeneous platforms is a challenging task with existing design tools due to a lack of rigorous integration between high level abstract modeling, and low level synthesis and analysis. In this paper, we present a new dataflow-based design tool, called the targeted dataflow interchange format (TDIF), for design, analysis, and implementation […]
Dec, 9
DRAM Scheduling Policy for GPGPU Architectures Based on a Potential Function
GPGPU architectures (applications) have several different characteristics compared to traditional CPU architectures (applications): highly multithreaded architectures and SIMD-execution behavior are the two important characteristics of GPGPU computing. In this paper, we propose a potential function that models the DRAM behavior in GPGPU architectures and a DRAM scheduling policy, Alpha-SJF policy to minimize the potential function. […]
Dec, 9
GPU implementations of scheduling heuristics for heterogeneous computing environments
This work presents the application of parallel computing techniques using Graphic Processing Units to improve the efficiency of scheduling heuristics for heterogeneous computing systems. The experimental evaluation of the proposed methods demonstrates that a significant reduction on the computing times can be attained, allowing to tackle large scheduling scenarios in reasonable execution times.
Dec, 9
Computer Finit-Difference Time-Domain Simulation of Electromagnetic Wave Propagation using GPUs
The Finite-Difference Time-Domain (FDTD) solution of Maxwell’s equations, a grid-based differential time-domain numerical modeling method, is an approach for the direct modelling of the penetration of structures by continuous plane waves. Although FDTD techniques are considered to be relatively easy to understand and to implement in software, such modelling methods require a high level of […]
Dec, 9
Objective-Driven Workload Allocation in Heterogeneous Computing Systems
In this work, we explore heterogeneous computing hardware, including CPUs, GPUs and FPGAs, for scientific computing. We study system metrics such as throughput, energy efficiency and temperature, and formulate the problem of workload allocation among computing hardware in mathematical models with regards to the three metrics. The workload allocation approach is evaluated using Linpack on […]
Dec, 9
Enhanced Parallel ILU (p)-based Preconditioners for Multi-core CPUs and GPUs-The Power (g)-pattern Method
Application demands and grand challenges in numerical simulation require for both highly capable computing platforms and efficient numerical solution schemes. Power constraints and further miniaturization of modern and future hardware give way for multi- and manycore processors with increasing fine-grained parallelism and deeply nested hierarchical memory systems — as already exemplified by recent graphics processing […]
Dec, 9
Contributions to Parallel Simulation of Equation-Based Models on Graphics Processing Units
In this thesis we investigate techniques and methods for parallel simulation of equation-based, object-oriented (EOO) Modelica models on graphics processing units (GPUs). Modelica is being developed through an international effort via the Modelica Association. With Modelica it is possible to build computationally heavy models; simulating such models however might take a considerable amount of time. […]
Dec, 9
Parallel Computation of 2D Morse-Smale Complexes
The Morse-Smale complex is a useful topological data structure for the analysis and visualization of scalar data. This paper describes an algorithm that processes all mesh elements of the domain in parallel to compute the Morse-Smale complex of large twodimensional data sets at interactive speeds. We employ a reformulation of the Morse-Smale complex using Forman’s […]
Dec, 9
Exploring the Optimization Space of Multi-Core Architectures with OpenCL Benchmarks
Open Computing Language (OpenCL) is an open standard for writing portable software for heterogeneous architectures such as Central Processing Units (CPUs) and Graphic Processing Units (GPUs). Programs written in OpenCL are functionally portable across architectures. However, due to the architectural differences, OpenCL does not warrant performance portability. As previous research shows, different architectures are sensitive […]
Dec, 8
High Performance Multi-agent System based Simulations
Real-life city-traffic simulation presents a good example of multi-agent simulations involving a large number of agents (each human modelled as an individual agent). Analysis of emergent behaviors in social simulations largely depends on the number of agents involved (more than 100,000 agents at least). Due to large number of agents involved, it takes several seconds […]
Dec, 8
Performance of PETSc GPU Implementation with Sparse Matrix Storage Schemes
PETSc is a scalable solver library developed at Argonne National Laboratory (ANL). It is widely used for solving system of equations arising from discretisation of partial differential equations (PDEs). GPU support has recently been added to PETSc to exploit the performance of GPUs. This support is quite new and currently only available in the PETSc […]