16803

Posts

Dec, 10

Performance Evaluation and Optimization of HPCG benchmark on CPU + MIC platform

High-performance conjugate gradient (HPCG) is the latest benchmark adopted by the TOP500 organization, and thus how to optimize the HPCG source code for different heterogeneous computing platforms to achieve a higher floating-point computation rate has already become a new hot issue in HPC field. In the paper, we used the CPU + MIC heterogeneous computing […]
Dec, 10

GPGPU Accelerated Deep Object Classification on a Heterogeneous Mobile Platform

Deep convolutional neural networks achieve state-of-the-art performance in image classification. The computational and memory requirements of such networks are however huge, and that is an issue on embedded devices due to their constraints. Most of this complexity derives from the convolutional layers and in particular from the matrix multiplications they entail. This paper proposes a […]
Dec, 10

Adaptive Work-Efficient Connected Components on the GPU

This report presents an adaptive work-efficient approach for implementing the Connected Components algorithm on GPUs. The results show a considerable increase in performance (up to 6.8x) over current state-of-the-art solutions.
Dec, 10

BrainFrame: A heterogeneous accelerator platform for neuron simulations

OBJECTIVE: The advent of High-Performance Computing (HPC) in recent years has led to its increasing use in brain study through computational models. The scale and complexity of such models are constantly increasing, leading to challenging computational requirements. Even though modern HPC platforms can often deal with such challenges, the vast diversity of the modeling field […]
Dec, 6

Brownian Dynamics of Active Sphere Suspensions Confined Near a No-Slip Boundary

We develop numerical methods for performing efficient Brownian dynamics of colloidal suspensions confined to remain in the vicinity of a no-slip wall by gravity or active flows. We present a stochastic Adams-Bashforth integrator for the Brownian dynamic equations, which is second-order accurate deterministically and uses a random finite difference to capture the stochastic drift proportional […]
Dec, 6

Parallelization and Performance of the NIM Weather Model for CPU, GPU and MIC Processors

The design and performance of the NIM global weather prediction model is described. NIM was designed to run on GPU and MIC processors. It demonstrates efficient parallel performance and scalability to tens of thousands of compute nodes, and has been an effective way to make comparisons between traditional CPU and emerging fine-grain processors. Design of […]
Dec, 6

A Comparison of GPU Execution Time Prediction using Machine Learning and Analytical Modeling

Today, most high-performance computing (HPC) platforms have heterogeneous hardware resources (CPUs, GPUs, storage, etc.) A Graphics Processing Unit (GPU) is a parallel computing coprocessor specialized in accelerating vector operations. The prediction of application execution times over these devices is a great challenge and is essential for efficient job scheduling. There are different approaches to do […]
Dec, 6

ESE: Efficient Speech Recognition Engine with Compressed LSTM on FPGA

Long Short-Term Memory (LSTM) is widely used in speech recognition. In order to achieve higher prediction accuracy, machine learning scientists have built larger and larger models. Such large model is both computation intensive and memory intensive. Deploying such bulky model results in high power consumption and leads to high total cost of ownership (TCO) of […]
Dec, 6

GPU-accelerated algorithms for many-particle continuous-time quantum walks

Many-particle continuous-time quantum walks (CTQWs) represent a resource for several tasks in quantum technology, including quantum search algorithms and universal quantum computation. In order to design and implement CTQWs in a realistic scenario, one needs effective simulation tools for Hamiltonians that take into account static noise and fluctuations in the lattice, i.e. Hamiltonians containing stochastic […]
Dec, 3

OpenACC cache Directive: Opportunities and Optimizations

OpenACC’s programming model presents a simple interface to programmers, offering a trade-off between performance and development effort. OpenACC relies on compiler technologies to generate efficient code and optimize for performance. Among the difficult to implement directives, is the cache directive. The cache directive allows the programmer to utilize accelerator’s hardware- or software-managed caches by passing […]
Dec, 3

Accelerating string tokenization with FPGAs for IoT data handling equipment

This paper reports on the results of a study to accelerate string tokenization using FPGAs suitable for both IoT gateways and data center servers. The prototype developed with Xilinx High-Level Synthesis software runs at 200 MHz and processes up to 32 ASCII characters per clock cycle. It incorporates either OpenCL or our own framework (Volvox) […]
Dec, 3

Should I use TensorFlow?

Google’s Machine Learning framework TensorFlow was open-sourced in November 2015 [1] and has since built a growing community around it. TensorFlow is supposed to be flexible for research purposes while also allowing its models to be deployed productively. This work is aimed towards people with experience in Machine Learning considering whether they should use TensorFlow […]

Recent source codes

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us:

contact@hpgu.org