18363

Posts

Jul, 5

Evaluating the Efficiency of CPUs, GPUs and FPGAs on a Near-Duplicate Document Detection Via OpenCL

Discovering identical or near-identical items is urgently important in many applications such as Web crawling since it drastically reduces the text processing costs. Simhash is a widely used technique, able to attribute a bit-string identity to a text, such that similar texts have similar identities. In this study, a real-time solution for a simhash calculation […]
Jul, 5

A Survey on Agent-based Simulation using Hardware Accelerators

Due to decelerating gains in single-core CPU performance, computationally expensive simulations are increasingly executed on highly parallel hardware platforms. Agent-based simulations, where simulated entities act with a certain degree of autonomy, frequently provide ample opportunities for parallelisation. Thus, a vast variety of approaches proposed in the literature demonstrated considerable performance gains using hardware platforms such […]
Jul, 5

XGBoost: Scalable GPU Accelerated Learning

We describe the multi-GPU gradient boosting algorithm implemented in the XGBoost library. Our algorithm allows fast, scalable training on multi-GPU systems with all of the features of the XGBoost library. We employ data compression techniques to minimise the usage of scarce GPU memory while still allowing highly efficient implementation. Using our algorithm we show that […]
Jul, 1

Directive-Based, High-Level Programming and Optimizations for High-Performance Computing with FPGAs

Reconfigurable architectures like Field Programmable Gate Arrays (FPGAs) have been used for accelerating computations from several domains because of their unique combination of flexibility, performance, and power efficiency. However, FPGAs have not been widely used for high-performance computing, primarily because of their programming complexity and difficulties in optimizing performance. In this paper, we present a […]
Jul, 1

Analysis-driven Engineering of Comparison-based Sorting Algorithms on GPUs

We study the relationship between memory accesses, bank conflicts, thread multiplicity (also known as over-subscription) and instruction-level parallelism in comparison-based sorting algorithms for Graphics Processing Units (GPUs). We experimentally validate a proposed formula that relates these parameters with asymptotic analysis of the number of memory accesses by an algorithm. Using this formula we analyze and […]
Jul, 1

Reducing the Cost of Heuristic Generation with Machine Learning

The space of compile-time transformations and or run-time options which can improve the performance of a given code is usually so large as to be virtually impossible to search in any practical time-frame. Thus, heuristics are leveraged which can suggest good but not necessarily best configurations. Unfortunately, since such heuristics are tightly coupled to processor […]
Jul, 1

Ray-traced Radiative Transfer on Massively Threaded Architectures

In this thesis, I apply techniques from the field of computer graphics to ray tracing in astrophysical simulations, and introduce the GRACE software library. This is combined with an extant radiative transfer solver to produce a new package, TARANIS. It allows for fully-parallel particle updates via per-particle accumulation of rates, followed by a forward Euler […]
Jul, 1

Compiler Fuzzing through Deep Learning

Random program generation – fuzzing – is an effective technique for discovering bugs in compilers but successful fuzzers require extensive development effort for every language supported by the compiler, and often leave parts of the language space untested. We introduce DeepSmith, a novel machine learning approach to accelerating compiler validation through the inference of generative […]
Jun, 28

Introducing Parallelism to the Ranges TS

The current interface provided by the C++17 parallel algorithms poses some limitations with respect to parallel data access and heterogeneous systems, such as personal computers and server nodes with GPUs, smartphones, and embedded System on a Chip chipsets. In this paper, we present a summary of why we believe the Ranges TS solves these problems, […]
Jun, 28

Computing dynamics of thin films via large scale GPU-based simulations

We present the results of large scale simulations of 4th order nonlinear partial differential equations of dif- fusion type that are typically encountered when modeling dynamics of thin fluid films on substrates. The simulations are based on the alternate direction implicit (ADI) method, with the main part of the compu- tational work carried out in […]
Jun, 28

Analyzing Memory Accesses for Performance and Correctness of Parallel Programs

The demand for large compute capabilities in scientific computing led to wide use and acceptance of highly-parallel computer architectures during the last decade. This trend is manifested in the TOP500, listing the fastest supercomputer of the world, in which about 40 % of the performance share results from accelerator-based systems. Programming for these architectures in […]
Jun, 28

Improving tasks throughput on accelerators using OpenCL command concurrency

A heterogeneous architecture composed by a host and an accelerator must frequently deal with situations where several independent tasks are available to be offloaded onto the accelerator. These tasks can be generated by concurrent applications executing in the host or, in case the host is a node of a computer cluster, by applications running on […]
Page 2 of 95512345...102030...Last »

* * *

* * *

HGPU group © 2010-2018 hgpu.org

All rights belong to the respective authors

Contact us: