18343

Posts

Jun, 28

Analyzing Memory Accesses for Performance and Correctness of Parallel Programs

The demand for large compute capabilities in scientific computing led to wide use and acceptance of highly-parallel computer architectures during the last decade. This trend is manifested in the TOP500, listing the fastest supercomputer of the world, in which about 40 % of the performance share results from accelerator-based systems. Programming for these architectures in […]
Jun, 28

Migrating from OpenGL ES to Vulkan

This document outlines the key differences between OpenGL ES and the new Vulkan, and why a developer would want to migrate to Vulkan. Vulkan is a new low level graphics API that allows the developer to get very low level with an almost console-like API. This allows for greater control, performance and transparency. This is […]
Jun, 28

Improving tasks throughput on accelerators using OpenCL command concurrency

A heterogeneous architecture composed by a host and an accelerator must frequently deal with situations where several independent tasks are available to be offloaded onto the accelerator. These tasks can be generated by concurrent applications executing in the host or, in case the host is a node of a computer cluster, by applications running on […]
Jun, 24

Research on the simulation of PF-LBM model based on MPI+CUDA mixed granularity parallel

A microstructure numerical model is an intensive computational problem, for which the simulation time is too long and the simulation scale is too small. To solve these two problems, in this article, we use MPI+CUDA hybrid particle heterogeneous parallel computing to implement the dendrite growth simulation of a PF-LBM phase-field 3D model. Message Passing Interface […]
Jun, 24

Go game move prediction using convolutional neural network

The purpose of this paper is to introduce the use of convolutional neural network for prediction of the next appropriate move in the Go game. The paper contains description of the crucial Go game rules, neural networks theory, description of implemented programs and final evaluation of the trained neural networks. The programs were implemented with […]
Jun, 24

Synthesis of GPU Programs from High-Level Models

Modern graphics processing units (GPUs) provide high-performance general purpose computation abilities. They have massive parallel architectures that are suitable for executing parallel algorithms and operations. They are also throughput-oriented devices that are optimized to achieve high throughput for stream processing. Designing efficient GPU programs is a notoriously difficult task. The ForSyDe methodology is suitable to […]
Jun, 24

Strategies for the Heterogeneous Execution of Large-Scale Simulations on Hybrid Supercomputers

Massively-parallel devices of various architectures are being adopted by the newest supercomputers to overcome the actual power constraint in the context of the exascale challenge. This progress leads to an increasing hybridisation of HPC systems and makes the design of computing applications a rather complex problem. Therefore, the software efficiency and portability are of crucial […]
Jun, 24

DeepSmith: Compiler Fuzzing through Deep Learning

Random program generation – fuzzing – is an effective technique for discovering bugs in compilers but successful fuzzers require extensive development effort for every language supported by the compiler, and often leave parts of the language space untested. We introduce DeepSmith, a novel machine learning approach to accelerating compiler validation through the inference of generative […]
Jun, 22

The 3rd International Conference on Computer Systems and Communication Technology (ICCSCT), 2018

It will bring together the researchers to exchange their research results and address open issues in: Computer Systems and Engineering Artificial Intelligence Computer Modeling Internet of Things (IoT) Decision Support System and Models Software Engineering Computer Graphics and Multimedia Image Processing System software support for multimedia Communication Technology Network and Wireless Communication VLSI Circuits and […]
Jun, 22

The 4th International Conference on Communication and Information Processing (ICCIP), 2018

Keynote speakers: Prof. Jalel Ben-Othman – University of Paris 13, France. Prof. Herwig Unger – University of Hagen, Germany. Published by: The accepted paper of ICCIP 2018 will be published into ICCIP 2018 Conference Proceedings, which will be published in the International Conference Proceedings Series by ACM and archived in the ACM Digital Library. The […]
Jun, 20

Energy Efficient Computing on Multi-core Processors: Vectorization and Compression Techniques

Over the past few years, energy consumption has become the main limiting factor for computing in general. This has led CPU vendors to aggressively promote parallel computing using multiple cores without significantly increasing the thermal design power of the processor. However, achieving maximum performance and energy efficiency from the available resources on the multi-core and […]
Jun, 20

RAPIDNN: In-Memory Deep Neural Network Acceleration Framework

Deep neural networks (DNN) have demonstrated effectiveness for various applications such as image processing, video segmentation, and speech recognition. Running state-of-the-art DNNs on current systems mostly relies on either general purpose processors, ASIC designs, or FPGA accelerators, all of which suffer from data movements due to the limited on chip memory and data transfer bandwidth. […]

Recent source codes

* * *

* * *

HGPU group © 2010-2018 hgpu.org

All rights belong to the respective authors

Contact us: