Posts
Jun, 10
Performance of a code migration for the simulation of supersonic ejector flow to SMP, MIC and GPU using OpenMP, OpenMP+LEO, and OpenACC directives
In this work, a serial source code for simulating a supersonic ejector flow is accelerated using parallelization based on OpenMP and OpenACC directives. The purpose is to reduce the development costs and to simplify the maintenance of the application due to the complexity of the FORTRAN source code. OpenMP has become the programming standard for […]
Jun, 10
A Pattern Specification and Optimizations Framework for Accelerating Scientific Computations on Heterogeneous Clusters
Clusters with accelerators at each node have emerged as the dominant high-end architecture in recent years. Such systems can be extremely hard to program because of the underlying heterogeneity and the need for exploiting parallelism at multiple levels. Thus, easing parallel programming today requires not only high-level programming models, but ones from which hybrid parallelism […]
Jun, 10
Sequential Monte Carlo Optimisation for Air Traffic Management
This report shows that significant reduction in fuel use could be achieved by the adoption of `free flight’ type of trajectories in the Terminal Manoeuvring Area (TMA) of an airport, under the control of an algorithm which optimises the trajectories of all the aircraft within the TMA simultaneously while maintaining safe separation. We propose the […]
Jun, 10
Design and optimization of DBSCAN Algorithm based on CUDA
DBSCAN is a very classic algorithm for data clus- tering, which is widely used in many fields. However, with the data scale growing much more bigger than before, the traditional serial algorithm can not meet the performance requirement. Recently, parallel computing based on CUDA has developed very fast and has great advantage on big data. […]
Jun, 8
Improving OpenCL Programmability with the Heterogeneous Programming Library
The use of heterogeneous devices is becoming increasingly widespread. Their main drawback is their low programmability due to the large amount of details that must be handled. Another important problem is the reduced code portability, as most of the tools to program them are vendor or device-specific. The exception to this observation is OpenCL, which […]
Jun, 8
CGO: G: Intelligent Heuristic Construction with Active Learning
Building effective optimization heuristics is a challenging task which often takes developers several months if not years to complete. Predictive modelling has recently emerged as a promising solution, automatically constructing heuristics from training data, however, obtaining this data can take months per platform. This is becoming an ever more critical problem as the pace of […]
Jun, 8
Exploring CPU-GPU Coherence
AMD, ARM and other members of the Heterogeneous Systems Architecture Foundation are focusing on integrated CPU-GPU systems with shared memory, to improve the programmability of heterogeneous systems. Such integration is also necessary to eliminate the energy and latency costs associated with conventional heterogeneous computation. This work investigates the relevance of CPU-GPU coherence for current heterogeneous […]
Jun, 8
Cryptanalysis of the McEliece Cryptosystem on GPGPUs
The linear code based McEliece cryptosystem is potentially promising as a so-called "post-quantum" public key cryptosystem because thus far it has resisted quantum cryptanalysis, but to be considered secure, the cryptosystem must resist other attacks as well. In 2011, Bernstein et al. introduced the "Ball Collision Decoding" (BCD) attack on McEliece which is a significant […]
Jun, 8
Bi-directional Path Tracing on GPU
Computer graphics renderers for creating photo-realistic images use mainly unidirectional path tracing, having good results for scenes without caustics or hard cases. There are also few renderers with bi-directional path tracing implementation, however due to the complexity of the algorithm implementation, they almost exclusively target sequential CPUs. The thesis proposes a way of implementation of […]
Jun, 7
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations. Advances like SPPnet and Fast R-CNN have reduced the running time of these detection networks, exposing region proposal computation as a bottleneck. In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus […]
Jun, 7
Implementation of K-shortest Path Algorithm in GPU Using CUDA
K-shortest path algorithm is generalization of the shortest path algorithm. K-shortest path is used in various fields like sequence alignment problem in molecular bioinformatics, robot motion planning, path finding in gene network where speed to calculate paths plays a vital role. Parallel implementation is one of the best ways to fulfill the requirement of these […]
Jun, 7
Meta-Programming and Auto-Tuning in the Search for High Performance GPU Code
Writing high performance GPGPU code is often difficult and time-consuming, potentially requiring laborious manual tuning of low-level details. Despite these challenges, the cost in ignoring GPUs in high performance computing is increasingly large. Auto-tuning is a potential solution to the problem of tedious manual tuning. We present a framework for auto-tuning GPU kernels which are […]