18356

Posts

Jul, 1

Analysis-driven Engineering of Comparison-based Sorting Algorithms on GPUs

We study the relationship between memory accesses, bank conflicts, thread multiplicity (also known as over-subscription) and instruction-level parallelism in comparison-based sorting algorithms for Graphics Processing Units (GPUs). We experimentally validate a proposed formula that relates these parameters with asymptotic analysis of the number of memory accesses by an algorithm. Using this formula we analyze and […]
Jul, 1

Reducing the Cost of Heuristic Generation with Machine Learning

The space of compile-time transformations and or run-time options which can improve the performance of a given code is usually so large as to be virtually impossible to search in any practical time-frame. Thus, heuristics are leveraged which can suggest good but not necessarily best configurations. Unfortunately, since such heuristics are tightly coupled to processor […]
Jul, 1

Ray-traced Radiative Transfer on Massively Threaded Architectures

In this thesis, I apply techniques from the field of computer graphics to ray tracing in astrophysical simulations, and introduce the GRACE software library. This is combined with an extant radiative transfer solver to produce a new package, TARANIS. It allows for fully-parallel particle updates via per-particle accumulation of rates, followed by a forward Euler […]
Jul, 1

Compiler Fuzzing through Deep Learning

Random program generation – fuzzing – is an effective technique for discovering bugs in compilers but successful fuzzers require extensive development effort for every language supported by the compiler, and often leave parts of the language space untested. We introduce DeepSmith, a novel machine learning approach to accelerating compiler validation through the inference of generative […]
Jun, 28

Introducing Parallelism to the Ranges TS

The current interface provided by the C++17 parallel algorithms poses some limitations with respect to parallel data access and heterogeneous systems, such as personal computers and server nodes with GPUs, smartphones, and embedded System on a Chip chipsets. In this paper, we present a summary of why we believe the Ranges TS solves these problems, […]
Jun, 28

Computing dynamics of thin films via large scale GPU-based simulations

We present the results of large scale simulations of 4th order nonlinear partial differential equations of dif- fusion type that are typically encountered when modeling dynamics of thin fluid films on substrates. The simulations are based on the alternate direction implicit (ADI) method, with the main part of the compu- tational work carried out in […]
Jun, 28

Analyzing Memory Accesses for Performance and Correctness of Parallel Programs

The demand for large compute capabilities in scientific computing led to wide use and acceptance of highly-parallel computer architectures during the last decade. This trend is manifested in the TOP500, listing the fastest supercomputer of the world, in which about 40 % of the performance share results from accelerator-based systems. Programming for these architectures in […]
Jun, 28

Migrating from OpenGL ES to Vulkan

This document outlines the key differences between OpenGL ES and the new Vulkan, and why a developer would want to migrate to Vulkan. Vulkan is a new low level graphics API that allows the developer to get very low level with an almost console-like API. This allows for greater control, performance and transparency. This is […]
Jun, 28

Improving tasks throughput on accelerators using OpenCL command concurrency

A heterogeneous architecture composed by a host and an accelerator must frequently deal with situations where several independent tasks are available to be offloaded onto the accelerator. These tasks can be generated by concurrent applications executing in the host or, in case the host is a node of a computer cluster, by applications running on […]
Jun, 24

Research on the simulation of PF-LBM model based on MPI+CUDA mixed granularity parallel

A microstructure numerical model is an intensive computational problem, for which the simulation time is too long and the simulation scale is too small. To solve these two problems, in this article, we use MPI+CUDA hybrid particle heterogeneous parallel computing to implement the dendrite growth simulation of a PF-LBM phase-field 3D model. Message Passing Interface […]
Jun, 24

Go game move prediction using convolutional neural network

The purpose of this paper is to introduce the use of convolutional neural network for prediction of the next appropriate move in the Go game. The paper contains description of the crucial Go game rules, neural networks theory, description of implemented programs and final evaluation of the trained neural networks. The programs were implemented with […]
Jun, 24

Synthesis of GPU Programs from High-Level Models

Modern graphics processing units (GPUs) provide high-performance general purpose computation abilities. They have massive parallel architectures that are suitable for executing parallel algorithms and operations. They are also throughput-oriented devices that are optimized to achieve high throughput for stream processing. Designing efficient GPU programs is a notoriously difficult task. The ForSyDe methodology is suitable to […]
Page 4 of 957« First...23456...102030...Last »

Recent source codes

* * *

* * *

HGPU group © 2010-2018 hgpu.org

All rights belong to the respective authors

Contact us: