Posts
Jul, 5
Exploration of Low Numeric Precision Deep Learning Inference Using Intel FPGAs
CNNs have been shown to maintain reasonable classification accuracy when quantized to lower precisions. Quantizing to sub 8-bit activations and weights can result in accuracy falling below an acceptable threshold. Techniques exist for closing the accuracy gap of limited numeric precision typically by increasing computation. This results in a trade-off between throughput and accuracy and […]
Jul, 5
Evaluating the Efficiency of CPUs, GPUs and FPGAs on a Near-Duplicate Document Detection Via OpenCL
Discovering identical or near-identical items is urgently important in many applications such as Web crawling since it drastically reduces the text processing costs. Simhash is a widely used technique, able to attribute a bit-string identity to a text, such that similar texts have similar identities. In this study, a real-time solution for a simhash calculation […]
Jul, 5
A Survey on Agent-based Simulation using Hardware Accelerators
Due to decelerating gains in single-core CPU performance, computationally expensive simulations are increasingly executed on highly parallel hardware platforms. Agent-based simulations, where simulated entities act with a certain degree of autonomy, frequently provide ample opportunities for parallelisation. Thus, a vast variety of approaches proposed in the literature demonstrated considerable performance gains using hardware platforms such […]
Jul, 5
XGBoost: Scalable GPU Accelerated Learning
We describe the multi-GPU gradient boosting algorithm implemented in the XGBoost library. Our algorithm allows fast, scalable training on multi-GPU systems with all of the features of the XGBoost library. We employ data compression techniques to minimise the usage of scarce GPU memory while still allowing highly efficient implementation. Using our algorithm we show that […]
Jul, 1
Directive-Based, High-Level Programming and Optimizations for High-Performance Computing with FPGAs
Reconfigurable architectures like Field Programmable Gate Arrays (FPGAs) have been used for accelerating computations from several domains because of their unique combination of flexibility, performance, and power efficiency. However, FPGAs have not been widely used for high-performance computing, primarily because of their programming complexity and difficulties in optimizing performance. In this paper, we present a […]
Jul, 1
Analysis-driven Engineering of Comparison-based Sorting Algorithms on GPUs
We study the relationship between memory accesses, bank conflicts, thread multiplicity (also known as over-subscription) and instruction-level parallelism in comparison-based sorting algorithms for Graphics Processing Units (GPUs). We experimentally validate a proposed formula that relates these parameters with asymptotic analysis of the number of memory accesses by an algorithm. Using this formula we analyze and […]
Jul, 1
Reducing the Cost of Heuristic Generation with Machine Learning
The space of compile-time transformations and or run-time options which can improve the performance of a given code is usually so large as to be virtually impossible to search in any practical time-frame. Thus, heuristics are leveraged which can suggest good but not necessarily best configurations. Unfortunately, since such heuristics are tightly coupled to processor […]
Jul, 1
Ray-traced Radiative Transfer on Massively Threaded Architectures
In this thesis, I apply techniques from the field of computer graphics to ray tracing in astrophysical simulations, and introduce the GRACE software library. This is combined with an extant radiative transfer solver to produce a new package, TARANIS. It allows for fully-parallel particle updates via per-particle accumulation of rates, followed by a forward Euler […]
Jul, 1
Compiler Fuzzing through Deep Learning
Random program generation – fuzzing – is an effective technique for discovering bugs in compilers but successful fuzzers require extensive development effort for every language supported by the compiler, and often leave parts of the language space untested. We introduce DeepSmith, a novel machine learning approach to accelerating compiler validation through the inference of generative […]
Jun, 28
Introducing Parallelism to the Ranges TS
The current interface provided by the C++17 parallel algorithms poses some limitations with respect to parallel data access and heterogeneous systems, such as personal computers and server nodes with GPUs, smartphones, and embedded System on a Chip chipsets. In this paper, we present a summary of why we believe the Ranges TS solves these problems, […]
Jun, 28
Computing dynamics of thin films via large scale GPU-based simulations
We present the results of large scale simulations of 4th order nonlinear partial differential equations of dif- fusion type that are typically encountered when modeling dynamics of thin fluid films on substrates. The simulations are based on the alternate direction implicit (ADI) method, with the main part of the compu- tational work carried out in […]
Jun, 28
Analyzing Memory Accesses for Performance and Correctness of Parallel Programs
The demand for large compute capabilities in scientific computing led to wide use and acceptance of highly-parallel computer architectures during the last decade. This trend is manifested in the TOP500, listing the fastest supercomputer of the world, in which about 40 % of the performance share results from accelerator-based systems. Programming for these architectures in […]