Posts
Jul, 22
An algebraic parallel treecode in arbitrary dimensions
We present a parallel treecode for fast kernel summation in high dimensions – a common problem in data analysis and computational statistics. Fast kernel summations can be viewed as approximation schemes for dense kernel matrices. Treecode algorithms (or simply treecodes) construct low-rank approximations of certain off-diagonal blocks of the kernel matrix. These blocks are identified […]
Jul, 22
Generating Binary Optimal Codes Using Heterogeneous Parallel Computing
Generation of optimal codes is a well known problem in coding theory. Many computational approaches exist in the literature for finding record breaking codes. However generating codes with long lengths n using serial algorithms is computationally very expensive, for example the worst case time complexity of a Greedy algorithm is O(n4^n). In order to improve […]
Jul, 20
Performance Analysis of GPU-Accelerated Filter-Based Source Finding for HI Spectral Line Image Data
Searching for sources of electromagnetic emission in spectral-line radio astronomy interferometric data is a computationally intensive process. Parallel programming techniques and High Performance Computing hardware may be used to improve the computational performance of a source finding program. However, it is desirable to further reduce the processing time of source finding in order to decrease […]
Jul, 20
Accelerating a Movie Recommender System Using VirtualCL on a Heterogeneous GPU Cluster
Present day market offers a large number of movies which overwhelm people with choices. In order to quickly navigate through all the possible movies and find the interesting ones, the user can take advantage of recommender systems for movies. This thesis studies a movie recommender system which uses image processing and computer vision algorithms. The […]
Jul, 20
Parallel Programming in Actor-Based Applications via OpenCL
GPU and multicore hardware architectures are commonly used in many different application areas to accelerate problem solutions relative to single CPU architectures. The typical approach to accessing these hardware architectures requires embedding logic into the programming language used to construct the application; the two primary forms of embedding are: calls to API routines to access […]
Jul, 20
Improving performance portability for GPU-specific OpenCL kernels on multi-core/many-core CPUs by analysis-based transformations
OpenCL is an open heterogeneous programming framework. Although OpenCL programs are functionally portable, it does not provide performance portability, so code transformation often plays an irreplaceable role. When adapting GPU-specific OpenCL kernels to run on multi-core/many-core CPUs, coarsening the thread granularity is necessary and thus extensively used. However, locality concerns exposed in GPU-specific OpenCL code […]
Jul, 20
GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers
GROMACS is one of the most widely used open-source and free software codes in chemistry, used primarily for dynamical simulations of biomolecules. It provides a rich set of calculation types, preparation and analysis tools. Several advanced techniques for free-energy calculations are supported. In version 5, it reaches new performance heights, through several new and enhanced […]
Jul, 17
GPU-based visualization of domain-coloured algebraic Riemann surfaces
We examine an algorithm for the visualization of domain-coloured Riemann surfaces of plane algebraic curves. The approach faithfully reproduces the topology of the surface and also preserves some of its geometry. We discuss how the algorithm can be implemented efficiently in OpenGL with geometry shaders, and (less efficiently) even in WebGL with multiple render targets […]
Jul, 17
Scaling Monte Carlo Tree Search on Intel Xeon Phi
Many algorithms have been parallelized successfully on the Intel Xeon Phi coprocessor, especially those with regular, balanced, and predictable data access patterns and instruction flows. Irregular and unbalanced algorithms are harder to parallelize efficiently. They are, for instance, present in artificial intelligence search algorithms such as Monte Carlo Tree Search (MCTS). In this paper we […]
Jul, 17
Many-Core Architectures: Hardware-Software Optimization and Modeling Techniques
During the last few decades an unprecedented technological growth has been at the center of the embedded systems design paramount, with Moore’s Law being the leading factor of this trend. Today in fact an ever increasing number of cores can be integrated on the same die, marking the transition from state-of-the-art multi-core chips to the […]
Jul, 17
DeepFont: Identify Your Font from An Image
As font is one of the core design concepts, automatic font identification and similar font suggestion from an image or photo has been on the wish list of many designers. We study the Visual Font Recognition (VFR) problem, and advance the state-of-the-art remarkably by developing the DeepFont system. First of all, we build up the […]
Jul, 17
Overhauling SC atomics in C11 and OpenCL
Despite the conceptual simplicity of sequential consistency (SC), the semantics of SC atomic operations and fences in the C11 and OpenCL memory models is subtle, leading to convoluted prose descriptions that translate to complex axiomatic formalisations. We conduct an overhaul of SC atomics in C11, reducing the associated axioms in both number and complexity. A […]