Platform-independent parallelization of the Lattice Boltzmann method with OpenCL
Department Informatik, Lehrstuhl fur Informatik 2, Programmiersysteme, Friedrich-Alexander-Universitat Erlangen-Nurnberg
Friedrich-Alexander-Universitat Erlangen-Nurnberg, 2012
@article{wolf2012platform,
title={Platform-independent parallelization of the Lattice Boltzmann method with OpenCL},
author={Wolf, C.},
year={2012}
}
Simulations, like fluid dynamics, are very computationally intensive problems. Since the Lattice Boltzmann method uses a discrete grid of cells for simulating the flow, there are no dependencies between the single cells during the computation for one time step. Therefore, the computing can easily be done in parallel. During the last years, multi-CPU computers have been developed. That caused many algorithms to be re-implemented for multithreaded applications. In consequence, results for the computational fluid dynamics could be provided much faster. While the multi-CPU approach has already been implemented, there is now another possibility to achieve fast results: the Open Computing Language (OpenCL) has been released, that allows to use the data-parallel calculating capacity of GPUs, which were mainly limited for rendering graphics so far, for computationally intensive problems equally. In addition to this, OpenCL allows to use multiple devices for computation, which means that a higher level of parallelism is reached. In this thesis, the possibilities of OpenCL to solve the fluid dynamics calculation should be examined. Therefore, it is important to find out whether the code has to be changed for performance reasons if it is run on different hardware components or OpenCL platforms (like those currently provided by NVIDIA, AMD or IBM) or not, and whether the implementation of the Lattice Boltzmann method in OpenCL brings any further advantages for fast computing in general. The result is that OpenCL is capable of much indeed; high calculation speed can be achieved with it to some extent. Furthermore, a programming strategy for efficient OpenCL programs could be developed during the implementation, testing and measuring: short kernel functions, that promise little synchronization delay and that can quickly be translated by the OpenCL just-in-time compiler, joined by many work-items that simultaneously execute the kernel code, produce efficient OpenCL programs that are able to use the device’s compute units to capacity.
October 17, 2012 by hgpu