Posts
Dec, 12
Multi-level Debugging for Multi-stage, Parallelizing Compilers
A multi-stage compilation framework transforms portions of programs written in a productivity-level language into an efficiency-level language, such as C, with explicit hardware-specific optimizations. It is challenging for compiler programmers to debug errors in the compilation because they must perform complicated end-to-end reasoning, relating the programs across the multiple stages of compilation. To simplify this […]
Dec, 12
Matrix-Matrix Multiplications on GPUs for Accelerating a Parallel Fluid Dynamics Code
A few approaches are investigated of matrix-matrix multiplication on graphics processing units (GPUs). Aspects of memory management and GPU saturation are described and discussed. The focus of this paper is to offload matrix-matrix multiplications to a GPU in an HPC setting for the purpose of accelerating a parallel fluid dynamics code.
Dec, 11
19th International European Conference on Parallel and Distributed Computing, Euro-Par 2013
Euro-Par is an annual series of international conferences dedicated to the promotion and advancement of all aspects of parallel and distributed computing. It covers a wide spectrum of topics from algorithms and theory to software technology and hardware-related issues, with application areas ranging from scientific to mobile and cloud computing. The objective of Euro-Par is […]
Dec, 10
GPU Computing with Applications in Digital Logic
After the opening of the graphics processing unit (GPU) for general purpose computations, an entirely new computing model has emerged providing a temporary break in the endless race for even faster and more powerful computing methods and devices. Since it originated in hardware primarily intended to implement highly demanding computations in computer graphics essentially based […]
Dec, 10
A GPU-Based Parallel Algorithm for Design Structure Matrix (DSM) Partition
In complicated system manufacturing and designing, the DSM has been proved to be powerful and effective for analyzing and optimizing the executional order of tasks. Many algorithms have been proposed to optimize the DSM, however, with the system complexity increasing, the number of tasks involved enlarges, which results in the rapid growth of time cost […]
Dec, 10
Scaling High Performance Domain-Specific Language Implementation with Delite
This thesis covers how to easily implement performance oriented embedded domainspecific languages. Exploiting heterogeneous parallel hardware currently requires mapping application code to multiple disparate programming models. Unfortunately, general-purpose programming models available today can yield high performance but are too low-level to be accessible to the average programmer. We propose leveraging domain-specific languages (DSLs) to map […]
Dec, 10
Vectorized Higher Order Finite Difference Kernels
Several highly optimized implementations of Finite Difference schemes are discussed. The combination of vectorization and an interleaved data layout, spatial and temporal loop tiling algorithms, loop unrolling, and parameter tuning lead to efficient computational kernels in one to three spatial dimensions, truncation errors of order two to twelve, and isotropic and compact anisotropic stencils. The […]
Dec, 10
GPU Architecture and the Programming Environment
Initially, computers were invented as devices to speed-up computations and facilitate the performance of repetitive mathematical operations. Their wider application in different areas upgraded this basic role and converted computers from calculating machines into devices for processing large amounts of data, with processing understood in a very general sense. The wonderfully large and ever increasing […]
Dec, 10
Fully Parallel Particle Learning for GPGPUs and Other Parallel Devices
We developed a novel parallel algorithm for particle filtering (and learning) which is specifically designed for GPUs (graphics processing units) or similar parallel computing devices. In our new algorithm, a full cycle of particle filtering (computing the value of the likelihood for each particle, constructing the cumulative distribution function (CDF) for resampling, resampling the particles […]
Dec, 10
Implementing QR Factorization Updating Algorithms on GPUs
Linear least squares problems are commonly solved by QR factorization. When multiple solutions have to be computed with only minor changes in the underlying data, knowledge of the difference between the old data set and the new one can be used to update an existing factorization at reduced computational cost. This paper investigates the viability […]
Dec, 10
A Practical, Targeted, and Stealthy Attack Against WPA Enterprise Authentication
Wireless networking technologies have fundamentally changed the way we compute, allowing ubiquitous, anytime, any-where access to information. At the same time, wireless technologies come with the security cost that adversaries may receive signals and engage in unauthorized communication even when not physically close to a network. Because of the utmost importance of wireless security, many […]
Dec, 10
GPU implementation of epidemiological behaviour in large social networks
In a social network, epidemic spread could be a spread of an infection, opinions, trends, fads, diseases or worm propagation in network. Epidemic spread computation on such huge and ever growing social networks is incredibly challenging. High-performance computing using GPUs has become an important tool to solve computationally intensive problems. This paper presents a GPU […]