Posts
Sep, 29
25th International Conference on Parallel Computational Fluid Dynamics, ParCFD 2013
As in the past years, ParCFD 2013 will include contributed and invited papers. The conference program will mainly consist of contributed lectures to all scientific/technical areas of the conference. ParCFD2013 topics include, but are not limited to: Complex 3D Flow Flows with Moving Interfaces Fluid-Structure Interaction Aerodynamics Hydrodynamics Turbulence Multi-Disciplinary Design Optimization Acoustics Atmospheric & […]
Sep, 28
Optimising Unstructured Mesh Computational Fluid Dynamics Applications on Multicores via Machine Learning and Code Transformation
We show that case-based reasoning (CBR) and deterministic code analysis can be successfully used in optimizing compilers of unstructured mesh applications to obtain better execution times. With the recent shift of CPU architectures towards SIMD capabilities, and of GPU architectures towards general purpose computing, it is no longer clear what optimizations are optimal given a […]
Sep, 28
A Hybrid Parallel Algorithm for Computing and Tracking Level Set Topology
The contour tree is a topological abstraction of a scalar field that captures evolution in level set connectivity. It is an effective representation for visual exploration and analysis of scientific data. We describe a work-efficient, output sensitive, and scalable parallel algorithm for computing the contour tree of a scalar field defined on a domain that […]
Sep, 28
Exploiting Concurrent GPU Operations for Efficient Work Stealing on Multi-GPUs
The race for Exascale computing has naturally led the current technologies to converge to multi-CPU/multi-GPU computers, based on thousands of CPUs and GPUs interconnected by PCI-Express buses or interconnection networks. To exploit this high computing power, programmers have to solve the issue of scheduling parallel programs on hybrid architectures. And, since the performance of a […]
Sep, 28
A fast Texture-by-numbers synthesis method based on texture optimization
The framework of Texture-by-numbers (TBN) synthesizes images of global-varying patterns with intuitive user control. Previous TBN synthesis methods have difficulties in achieving high-quality synthesis results and efficiency simultaneously. This paper proposes a fast TBN synthesis method based on texture optimization, which uses global optimization to solve the controllable non-homogeneous texture synthesis problem. Our algorithm produces […]
Sep, 28
Parallel Execution of Constraint Handling Rules on a Graphical Processing Unit
Graphical Processing Units (GPUs) consist of hundreds of small cores, collectively operating to provide massive computation capabilities. The aim of this work is to utilize this technology to execute Constraint Handling Rules (CHR) which are inherently parallel. A translation scheme is defined to transform a subset of CHR rules to C/C++, then to use a […]
Sep, 27
Increasing the performance of AllToAll variant of self-organizing migration algorithm using CUDA
Modern graphics processing units offer general purpose parallel computing capabilities. Thus they have become a relatively low cost alternative for applications requiring extensive parallel computations. Evolutionary algorithms are especially well suited for parallel SIMD architecture. This paper deals with the modification of AllToAll variation of self-organizing migration algorithm, which has high computational demand for one […]
Sep, 27
Deterministic Parallelism
A program is deterministic if it always produces the same output for a given input. Although sequential programs are often deterministic by default, parallel programs are more susceptible to behaving nondeterministically because instructions from different threads can be interleaved unpredictably. Non-determinism complicates the task of developing and maintaining software because it makes reasoning about program […]
Sep, 27
GPU-based tuning of quantum-inspired genetic algorithm for a combinatorial optimization problem
This paper concerns efficient parameters tuning (meta-optimization) of a state-of-the-art metaheuristic, Quantum-Inspired Genetic Algorithm (QIGA), in a GPU-based massively parallel computing environment (NVidia CUDA technology). A novel approach to parallel implementation of the algorithm has been presented. In a block of threads, each thread transforms a separate quantum individual or different quantum gene; In each […]
Sep, 27
Lattice QCD based on OpenCL
We present an OpenCL-based Lattice QCD application using a heatbath algorithm for the pure gauge case and Wilson fermions in the twisted mass formulation. The implementation is platform independent and can be used on AMD or NVIDIA GPUs, as well as on classical CPUs. On the AMD Radeon HD 5870 our double precision dslash implementation […]
Sep, 27
GPU Acceleration of Image Convolution using Spatially-varying Kernel
Image subtraction in astronomy is a tool for transient object discovery such as asteroids, extra-solar planets and supernovae. To match point spread functions (PSFs) between images of the same field taken at different times a convolution technique is used. Particularly suitable for large-scale images is a computationally intensive spatially-varying kernel. The underlying algorithm is inherently […]
Sep, 26
Improved Row-Grouped CSR Format for Storing of Sparse Matrices on GPU
We present new format for storing sparse matrices on GPU. We compare it with several other formats including CUSPARSE which is today probably the best choice for processing of sparse matrices on GPU in CUDA. Contrary to CUSPARSE which works with common CSR format, our new format requires conversion. However, multiplication of sparse-matrix and vector […]