## Posts

Jan, 4

### Batched Linear Algebra Problems on GPU Accelerators

The emergence of multicore and heterogeneous architectures requires many linear algebra algorithms to be redesigned to take advantage of the accelerators, such as GPUs. A particularly challenging class of problems, arising in numerous applications, involves the use of linear algebra operations on many small-sized matrices. The size of these matrices is usually the same, up […]

Jan, 4

### Programming Models and Scheduling Techniques for Heterogeneous Architectures

There is a clear trend nowadays to use heterogeneous high-performance computers, as they offer considerably greater computing power than homogeneous CPU systems. Extending traditional CPU systems with specialized units (accelerators such as GPGPUs) has become a revolution in the HPC world. Both the traditional performance-per-Watt and the performance-per-Euro ratios have been increased with the use […]

Jan, 4

### Automatic Performance Tuning of Stencil Computations on Graphics Processing Units

The focus of this work is the automatic performance tuning of stencil computations on Graphics Processing Units (GPUs). A strategy is presented that uses machine learning to determine the best way to use the GPU memory followed by a heuristic that divides the remaining optimizations into groups and exhaustively explores one group at a time. […]

Jan, 4

### CUDA Parallel Algorithms for Forward and Inverse Structural Gravity Problems

This paper describes usage of CUDA parallelization scheme for forward and inverse gravity problems for structural boundaries. Forward problem is calculated using the finite elements approach. This means that the whole calculation volume is split into parallelepipeds and then the gravity effect of each is calculated using known formula. Inverse problem solution is found using […]

Jan, 4

### Accelerating Binary Genetic Algorithm Driven Missile Design Optimization Routine with a CUDA Coded Six Degrees-Of-Freedom Simulator

Science and Engineering has benefited enormously from the advent of modern (digital) computing. As technology continues to grow, computation capability becomes exponentially faster, more reliable, and more efficient. While modeling and simulations have hurdled analysis past many years of trial and error, they still are restricted by resources, even with modern computing. Whether running Monte […]

Dec, 31

### A Comparison of the performance of HPC Accelerators

This project aims to port the scientific application GADGET-3 to multiple accelerators, research on the performance achieved and compare the porting/optimisations on the given accelerators with different architectures. In this project, the most time-consuming functions of GADGET-3 was identified based on the profiling. Partial functions in GADGET-3 were ported to the accelerator NVIDIA K40 card […]

Dec, 31

### Accelerator weather forecasting

Advection is the transport of a quantity due to fluid flow, and is an important, computationally intensive part of any fluid simulation. OpenACC GPU acceleration of the advection components of MONC, an atmospheric LES, was pursued. Although this yielded no speedup, the reasons for this are examined, and the conditions under which it may become […]

Dec, 31

### Study of basic vector operations on Intel Xeon Phi and NVIDIA Tesla using OpenCL

The present work is an analysis of the performance of the basic vector operations AXPY, DOT and SpMV using OpenCL. The code was tested on the NVIDIA Tesla S2050 GPU and Intel Xeon Phi 3120A coprocessor. Due to the nature of the AXPY function, only two versions were implemented, the routine to be executed by […]

Dec, 31

### A Deep Generative Deconvolutional Image Model

A deep generative model is developed for representation and analysis of images, based on a hierarchical convolutional dictionary-learning framework. Stochastic unpooling is employed to link consecutive layers in the model, yielding top-down image generation. A Bayesian support vector machine is linked to the top-layer features, yielding max-margin discrimination. Deep deconvolutional inference is employed when testing, […]

Dec, 31

### Parallel 3D Fast Wavelet Transform comparison on CPUs and GPUs

We present in this paper several implementations of the 3D Fast Wavelet Transform (3D-FWT) on multicore CPUs and manycore GPUs. On the GPU side, we focus on CUDA and OpenCL programming to develop methods for an efficient mapping on manycores. On multicore CPUs, OpenMP and Pthreads are used as counterparts to maximize parallelism, and renowned […]

Dec, 31

### Accelerating Fluids Simulation Using SPH and Implementation on GPU

Fluids simulation is usually done with CFD methods which offers high precision but needs days/weeks/months to compute on desktop CPUs which limits the practical use in industrial control systems. In order to reduce the computation time Smoothed Particle Hydrodynamics (SPH) method is used. SPH is commonly used to simulate fluids in computer graphics field, especially […]

Dec, 23

### Poseidon: A System Architecture for Efficient GPU-based Deep Learning on Multiple Machines

Deep learning (DL) has achieved notable successes in many machine learning tasks. A number of frameworks have been developed to expedite the process of designing and training deep neural networks (DNNs), such as Caffe, Torch and Theano. Currently they can harness multiple GPUs on a single machine, but are unable to use GPUs that are […]