Posts
Dec, 16
Productive High Performance Parallel Programming with Auto-tuned Domain-Specific Embedded Languages
As the complexity of machines and architectures has increased, performance tuning has become more challenging, leading to the failure of general compilers to generate the best possible optimized code. Expert performance programmers can often hand-write code that outperforms compiler-optimized low-level code by an order of magnitude. At the same time, the complexity of programs has […]
Dec, 16
Communication-Avoiding Optimization of Geometric Multigrid on GPUs
Multigrid methods are widely used to accelerate the convergence of iterative solvers for linear systems in a number of different application areas. In this report, we explore communication-avoiding implementations of Geometric Multigrid on Nvidia GPUs. We achieved an overall gain of 1.2x for the whole multigrid algorithm over baseline implementation. We also provide an insight […]
Dec, 16
Circular Hough Transform in OpenCL
In this paper, the details of the circular hough transform are explained and the performances of three different implementations(CPU, OpenCL and CUDA) are also shown. The goal of this project is to contribute to the computer vision literature by porting the circular hough transform written in CUDA to OpenCL.
Dec, 15
Performance study of using the Direct Compute API for implementing Support vector machines on GPUs
Today graphics processing units (GPUs) are not only able to generate graphical imaging but also able to expose its multicore architecture to increase computationally heavy general purpose algorithms that can be adapted to the multicore architecture of the GPU. The study conducted in this thesis explores the efficiency of using the general purpose graphics processing […]
Dec, 15
Advanced Techniques for the Rendering and Visualization of Volumetric Seismic Data
An important part of today’s search for hydrocarbon reservoirs such as oil and gas is the use of seismic methods which measure changes in acoustic impedance to explore the interior of the earth. Similar to medical imaging techniques such as MRI or CT, seismic methods generate image slices (survey lines) through the subsurface geology. By […]
Dec, 15
Image Processing using Parallel Computing
In 1980’s time, people believed that computer would help to create more faster and efficient processors. But parallel processing challenged the idea. It joined two or more computers together to solve a problem jointly. It was a trend in 1990 to move away from expansive super computers towards network computers like PCs or Workstations. It […]
Dec, 14
Ferrofluid Simulations with the Barnes-Hut Algorithm on Graphics Processing Units
We present an approach to molecular-dynamics simulations of dilute ferrofluids on graphics processing units (GPUs). Our numerical scheme is based on a GPU-oriented modification of the Barnes-Hut (BH) algorithm designed to increase the parallelism of computations. For an ensemble consisting of one million of ferromagnetic particles, the performance of the proposed algorithm on a Tesla […]
Dec, 14
GPU-Accelerated Direct Volume Rendering of Finite Element Data Sets
Direct Volume Rendering of Finite Element models is challenging since the visualisation process is performed in world coordinates, whereas data fields are usually defined over the elements’ material coordinate system. In this paper we present a framework for Direct Volume Rendering of Finite Element models. We present several novel implementations visualising Finite Element data directly […]
Dec, 14
Real-time adaptive algorithms using a Graphics Processing Unit
Graphics Processing Units (GPUs) have been recently used as coprocessors capable of performing tasks that are not necessarily related to graphics processing in order to optimize computing resources. The use of GPUs has being extended to a wide variety of intensive-computation applications among which audio processing is included. However data transactions between the CPU and […]
Dec, 14
Visualizing Complex Functions Using GPUs
This document explains some common methods of visualizing complex functions and how to implement them on the GPU. Using the fragment shader, we visualize complex functions in the complex plane with the domain coloring method. Then using the vertex shader, we visualize complex functions defined on a unit sphere like spherical harmonics. Finally, we redesign […]
Dec, 14
A Static Load Balancing Scheme for Parallel Volume Rendering on Multi-GPU Clusters
GPU-based clusters are an attractive option for parallel volume rendering. One of the key issues in parallel volume rendering is load balancing, keeping a balanced workload per node is essential for improving performance. A good number of dynamic load balancing schemes have been proposed throughout the years. However, most of these approaches require runtime dynamic […]
Dec, 12
Towards Domain-specific Computing for Stencil Codes in HPC
High Performance Computing (HPC) systems are nowadays more and more heterogeneous. Different processor types can be found on a single node including accelerators such as Graphics Processing Units (GPUs). To cope with the challenge of programming such complex systems, this work presents a domain-specific approach to automatically generate code tailored to different processor types. Low-level […]