Dec, 15

Task Scheduling for Heterogeneous Multicore Systems

In recent years, as the demand for low energy and high performance computing has steadily increased, heterogeneous computing has emerged as an important and promising solution. Because most workloads can typically run most efficiently on certain types of cores, mapping tasks on the best available resources can not only save energy but also deliver high […]
Dec, 14

A Survey of Techniques for Architecting SLC/MLC/TLC Hybrid Flash Memory based SSDs

Flash memory based SSDs offer several attractive features and benefits compared to hard disk drive (HDD), such as shock resistance, better performance especially for random data access, etc. Depending on the number of bits in each cell, Flash memory can be designed as single/multi/triple level cell (SLC/MLC/TLC) which have different performance, density, cost and write […]
Dec, 10

Acceleration of Cellular Automata through Parallel Computing with OpenCL

Cellular Automata (CA) have its origins in the work of Von Neumann and, since then, have become an important research topic with a wide range of applications, ranging from DNA sequencing to ecological dynamics. One aspect that may be of interest during a CA simulation is the evolution in the number of individuals of each […]
Dec, 10

On algorithmic reductions in task-parallel programming models

Wide adoption of parallel processing hardware in mainstream computing as well as the interest for efficient parallel programming in developer communities increase the demand for programming models that offer support for common algorithmic patterns. An algorithmic pattern of particular interest are reductions. Reductions are iterative memory updates of a program variable and appear in many […]
Dec, 10

Investigating Half Precision Arithmetic to Accelerate Dense Linear System Solvers

The use of low-precision arithmetic in mixed-precision computing methods has been a powerful tool to accelerate numerous scientific computing applications. Artificial intelligence (AI) in particular has pushed this to current extremes, making use of half-precision floating-point arithmetic (FP16) in approaches based on neural networks. The appeal of FP16 is in the high performance that can […]
Dec, 10

FPGA-Accelerated Image Processing Using High Level Synthesis with OpenCL

High Level Synthesis (HLS) is a new method for developing applications for use on FPGAs. Instead of the classic approach using a Hardware Descriptive Language (HDL), a high level programming language can be used. HLS has many perks, including high level debugging and simulation of the system being developed. This shortens the development time which […]
Dec, 10

Distributed learning of CNNs on heterogeneous CPU/GPU architectures

Convolutional Neural Networks (CNNs) have shown to be powerful classification tools in tasks that range from check reading to medical diagnosis, reaching close to human perception, and in some cases surpassing it. However, the problems to solve are becoming larger and more complex, which translates to larger CNNs, leading to longer training times that not […]
Dec, 7

Practical Implementation of Lattice QCD Simulation on Intel Xeon Phi Knights Landing

We investigate implementation of lattice Quantum Chromodynamics (QCD) code on the Intel Xeon Phi Knights Landing (KNL). The most time consuming part of the numerical simulations of lattice QCD is a solver of linear equation for a large sparse matrix that represents the strong interaction among quarks. To establish widely applicable prescriptions, we examine rather […]
Dec, 7

A tutorial on the implementations of linear image filters in CPU and GPU

This article presents an overview of the implementation of linear image filters in CPU and GPU. The main goal is to present a self contained discussion of different implementations and their background using tools from digital signal processing. First, using signal processing tools, we discuss different algorithms and estimate their computational cost. Then, we discuss […]
Dec, 7

A programming framework for data streaming on the Xeon Phi

ALICE (A Large Ion Collider Experiment) is the dedicated heavy-ion detector studying the physics of strongly interacting matter and the quark-gluon plasma at the CERN LHC (Large Hadron Collider). After the second long shut-down of the LHC, the ALICE detector will be upgraded to cope with an interaction rate of 50 kHz in Pb-Pb collisions, […]
Dec, 7

MILC Code Performance on High End CPU and GPU Supercomputer Clusters

With recent developments in parallel supercomputing architecture, many core, multi-core, and GPU processors are now commonplace, resulting in more levels of parallelism, memory hierarchy, and programming complexity. It has been necessary to adapt the MILC code to these new processors starting with NVIDIA GPUs, and more recently, the Intel Xeon Phi processors. We report on […]
Dec, 7

Study of Bandwidth Partitioning for Co-executing GPU Kernels

Co-executing GPU kernels on a partitioned GPU has been shown to improve utilization efficiency of poorly scaling tasks. While kernels can be executed in parallel, data transfers to the GPU are serial which can negatively impact parallelism and predictability of the kernels.In this work we implement a fairness-based approach to memory transfers by chunking data […]
Page 20 of 957« First...10...1819202122...304050...Last »

Recent source codes

* * *

* * *

HGPU group © 2010-2018 hgpu.org

All rights belong to the respective authors

Contact us: