Posts
Dec, 12
Hybrid Parallel Streamline Extraction Combining MPI and OpenCL
Recently scientific simulation application take advantage of modern accelerator technology more and more. For in-situ visualization techniques especially in this case scalability will become an issue. In this work we present a scalability evaluation for a hybrid parallelized streamline extraction algorithm.
Dec, 12
ACM International Conference on Computing Frontiers, CF2013
The Computing Frontiers conference focuses on a wide spectrum of advanced methodologies, technologies and radically new solutions relevant to the development of the whole spectrum of computing over the next few decades. The goals of the meeting range over from embedded to high-performance computing. We seek contributions on novel algorithms, computing paradigms, computational models, application […]
Dec, 12
CUDACLAW: a Data Parallel Solution Framework for Hyperbolic PDEs
We present CUDACLAW, a data-parallel solution framework for 2D and 3D hyperbolic partial differential equation (PDE) systems. CUDACLAW is a finite volume method based on time adaptive point-wise Riemann problem solvers, and can handle linear and nonlinear problems. The framework is tailored for the GPU architecture, optimized to take advantage of the powerful computational potential, […]
Dec, 12
FFT Parallel Implementation for MRI Image Reconstruction
This paper describes FFT Cooley-Tukey algorithm implementation used in MRI image reconstruction on a revolutionary parallel computing machine, Connex Array. By taking advantage of it’s vectorial structure and processing manner, MRI image reconstruction was much faster than most of usual MRI commercial scanners. Results are remarkable. Our proposal in this paper is the use of […]
Dec, 12
Gaussian Mixture Model Based Volume Visualization
Representing uncertainty when creating visualizations is becoming more indispensable to understand and analyze scientific data. Uncertainty may come from different sources, such as, ensembles of experiments or unavoidable information loss when performing data reduction. One natural model to represent uncertainty is to assume that each position in space instead of a single value may take […]
Dec, 12
Multi-level Debugging for Multi-stage, Parallelizing Compilers
A multi-stage compilation framework transforms portions of programs written in a productivity-level language into an efficiency-level language, such as C, with explicit hardware-specific optimizations. It is challenging for compiler programmers to debug errors in the compilation because they must perform complicated end-to-end reasoning, relating the programs across the multiple stages of compilation. To simplify this […]
Dec, 12
Matrix-Matrix Multiplications on GPUs for Accelerating a Parallel Fluid Dynamics Code
A few approaches are investigated of matrix-matrix multiplication on graphics processing units (GPUs). Aspects of memory management and GPU saturation are described and discussed. The focus of this paper is to offload matrix-matrix multiplications to a GPU in an HPC setting for the purpose of accelerating a parallel fluid dynamics code.
Dec, 11
19th International European Conference on Parallel and Distributed Computing, Euro-Par 2013
Euro-Par is an annual series of international conferences dedicated to the promotion and advancement of all aspects of parallel and distributed computing. It covers a wide spectrum of topics from algorithms and theory to software technology and hardware-related issues, with application areas ranging from scientific to mobile and cloud computing. The objective of Euro-Par is […]
Dec, 10
GPU Computing with Applications in Digital Logic
After the opening of the graphics processing unit (GPU) for general purpose computations, an entirely new computing model has emerged providing a temporary break in the endless race for even faster and more powerful computing methods and devices. Since it originated in hardware primarily intended to implement highly demanding computations in computer graphics essentially based […]
Dec, 10
A GPU-Based Parallel Algorithm for Design Structure Matrix (DSM) Partition
In complicated system manufacturing and designing, the DSM has been proved to be powerful and effective for analyzing and optimizing the executional order of tasks. Many algorithms have been proposed to optimize the DSM, however, with the system complexity increasing, the number of tasks involved enlarges, which results in the rapid growth of time cost […]
Dec, 10
Scaling High Performance Domain-Specific Language Implementation with Delite
This thesis covers how to easily implement performance oriented embedded domainspecific languages. Exploiting heterogeneous parallel hardware currently requires mapping application code to multiple disparate programming models. Unfortunately, general-purpose programming models available today can yield high performance but are too low-level to be accessible to the average programmer. We propose leveraging domain-specific languages (DSLs) to map […]
Dec, 10
Vectorized Higher Order Finite Difference Kernels
Several highly optimized implementations of Finite Difference schemes are discussed. The combination of vectorization and an interleaved data layout, spatial and temporal loop tiling algorithms, loop unrolling, and parameter tuning lead to efficient computational kernels in one to three spatial dimensions, truncation errors of order two to twelve, and isotropic and compact anisotropic stencils. The […]