## Posts

Jan, 8

### Performance and Power Comparisons Between Nvidia and ATI GPUs

In recent years, modern graphics processing units have been widely adopted in high performance computing areas to solve large scale computation problems. The leading GPU manufacturers Nvidia and ATI have introduced series of products to the market. While sharing many similar design concepts, GPUs from these two manufacturers differ in several aspects on processor cores […]

Jan, 7

### A Survey of Techniques For Improving Energy Efficiency in Embedded Computing Systems

Recent technological advances have greatly improved the performance and features of embedded systems. With the number of just mobile devices now reaching nearly equal to the population of earth, embedded systems have truly become ubiquitous. These trends, however, have also made the task of managing their power consumption extremely challenging. In recent years, several techniques […]

Jan, 5

### A New Sparse Matrix Vector Multiplication GPU Algorithm Designed for Finite Element Problems

Recently, graphics processors (GPUs) have been increasingly leveraged in a variety of scientific computing applications. However, architectural differences between CPUs and GPUs necessitate the development of algorithms that take advantage of GPU hardware. As sparse matrix vector multiplication (SPMV) operations are commonly used in finite element analysis, a new SPMV algorithm and several variations are […]

Jan, 5

### Subdivision Surface Evaluation as Sparse Matrix-Vector Multiplication

We present an interpretation of subdivision surface evaluation in the language of linear algebra. Specifically, the vector of surface points can be computed by left-multiplying the vector of control points by a sparse subdivision matrix. This "matrix-driven" interpretation applies to any level of subdivision, holds for many common subdivision schemes (including Catmull-Clark and Loop), supports […]

Jan, 5

### Hierarchical DAG Scheduling for Hybrid Distributed Systems

Accelerator-enhanced computing platforms have drawn a lot of attention due to their massive peak com-putational capacity. Despite significant advances in the pro-gramming interfaces to such hybrid architectures, traditional programming paradigms struggle mapping the resulting multi-dimensional heterogeneity and the expression of algorithm parallelism, resulting in sub-optimal effective performance. Task-based programming paradigms have the capability to alleviate […]

Jan, 5

### Methods and Metrics for Fair Server Assessment under Real-Time Financial Workloads

Energy efficiency has been a daunting challenge for datacenters. The financial industry operates some of the largest datacenters in the world. With increasing energy costs and the financial services sector growth, emerging financial analytics workloads may incur extremely high operational costs, to meet their latency targets. Microservers have recently emerged as an alternative to high-end […]

Jan, 5

### GPU: Power vs Performance

GPUs are widely being used to meet the ever increasing demands of High performance computing. High-end GPUs are one of the highest consumers of power in a computer. Power dissipation has always been a major concern area for computer architects. Due to power efficiency demands modern CPUs have moved towards multicore architectures. GPUs are already […]

Jan, 2

### Real-Time Incompressible Fluid Simulation on the GPU

We present a parallel framework for simulating incompressible fluids with predictive-corrective incompressible Smoothed Particle Hydrodynamics (PCISPH) on the GPU in real time. To this end, we propose an efficient GPU streaming pipeline to map the entire computational task onto the GPU, fully exploiting the massive computational power of state-of-the-art GPUs. In PCISPH-based simulations, neighbor search […]

Jan, 2

### Customization of OpenCL Applications for Efficient Task Mapping under Heterogeneous Platform Constraints

When targeting an OpenCL application to platforms with multiple heterogeneous accelerators, task tuning and mapping have to cope with device-specific constraints. To address this problem, we present an innovative design flow for the customization and performance optimization of OpenCL applications on heterogeneous parallel platforms. It consists of two phases: 1) a tuning phase that optimizes […]

Jan, 2

### Performance comparison of Lattice Boltzmann fluid flow simulation using OpenCL and CUDA frameworks

This paper presents performance comparison, of the lid-driven cavity flow simulation, with Lattice Boltzmann method, example, between CUDA and OpenCL parallel programming frameworks. CUDA is parallel programming model developed by NVIDIA for leveraging computing capabilities of their products. OpenCL is an open, royalty free, standard developed by Khronos group for parallel programming of heterogeneous devices […]

Jan, 2

### GPU-based acceleration of free energy calculations in solid state physics

Obtaining a thermodynamically accurate phase diagram through numerical calculations is a computationally expensive problem that is crucially important to understanding the complex phenomena of solid state physics, such as superconductivity. In this work we show how this type of analysis can be significantly accelerated through the use of modern GPUs. We illustrate this with a […]

Jan, 2

### Disjunctive Normal Networks

Artificial neural networks are powerful pattern classifiers; however, they have been surpassed in accuracy by methods such as support vector machines and random forests that are also easier to use and faster to train. Backpropagation, which is used to train artificial neural networks, suffers from the herd effect problem which leads to long training times […]