Posts
Oct, 17
Accelerating AutoDock VINA with GPUs
AutoDock VINA is one of the most-used docking tools in the early stage of modern drug discovery. It uses a Monte-Carlo based iterated search method and multithreading parallelism scheme on multicore machines to improve docking accuracy and speed. However, virtual screening from huge compound databases is common for modern drug discovery, which puts forward a […]
Oct, 17
Artificial Intelligence in Electric Machine Drives: Advances and Trends
This review paper systematically summarizes the existing literature on applying classical AI techniques and advanced deep learning algorithms to electric machine drives. It is anticipated that with the rapid progress in deep learning models and embedded hardware platforms, AI-based data-driven approaches will become increasingly popular for the automated high-performance control of electric machines. Additionally, this […]
Oct, 17
Beyond Desktop Computation: Challenges in Scaling a GPU Infrastructure
Enterprises and labs performing computationally expensive data science applications sooner or later face the problem of scale but unconnected infrastructure. For this up-scaling process, an IT service provider can be hired or in-house personnel can attempt to implement a software stack. The first option can be quite expensive if it is just about connecting several […]
Oct, 10
Parallel Actors and Learners: A Framework for Generating Scalable RL Implementations
Reinforcement Learning (RL) has achieved significant success in application domains such as robotics, games, health care and others. However, training RL agents is very time consuming. Current implementations exhibit poor performance due to challenges such as irregular memory accesses and synchronization overheads. In this work, we propose a framework for generating scalable reinforcement learning implementations […]
Oct, 10
AsymML: An Asymmetric Decomposition Framework for Privacy-Preserving DNN Training and Inference
Leveraging parallel hardware (e.g. GPUs) to conduct deep neural network (DNN) training/inference, though significantly speeds up the computations, raises several data privacy concerns. Trusted execution environments (TEEs) have emerged as a promising solution to enable privacy-preserving inference and training. TEEs, however, have limited memory and computation resources which renders it not comparable to untrusted parallel […]
Oct, 10
GCN Inference Acceleration using High-Level Synthesis
GCN (Graph Convolutional Network) has become a promising solution for many applications, such as recommendation systems, social data mining, etc. Many of these applications requires low latency GCN inference. In this paper, we provide a case study of a GCN inference acceleration on FPGA. We explore high-level synthesis programming model to achieve low-latency inference. First, […]
Oct, 10
Large-eddy simulations with ClimateMachine: a new open-source code for atmospheric simulations on GPUs and CPUs
We introduce ClimateMachine, a new open-source atmosphere modeling framework using the Julia language to be performance portable on central processing units (CPUs) and graphics processing units (GPUs). ClimateMachine uses a common framework both for coarser-resolution global simulations and for high-resolution, limited-area large-eddy simulations (LES). Here, we demonstrate the LES configuration of the atmosphere model in […]
Oct, 10
Implementation of Parallel Simplified Swarm Optimization in CUDA
As the acquisition cost of the graphics processing unit (GPU) has decreased, personal computers (PC) can handle optimization problems nowadays. In optimization computing, intelligent swarm algorithms (SIAs) method is suitable for parallelization. However, a GPU-based Simplified Swarm Optimization Algorithm has never been proposed. Accordingly, this paper proposed Parallel Simplified Swarm Optimization (PSSO) based on the […]
Oct, 3
HLS Portability from Intel to Xilinx: A Case Study
Field-programmable gate arrays (FPGAs) are a hardware accelerator option that is growing in popularity. However, FPGAs are notoriously hard to program. To this end, high-level synthesis (HLS) tools have been developed to allow programmers to design hardware accelerators with FPGAs using familiar software languages. The two largest FPGA vendors, Intel and Xilinx, support both C/C++ […]
Oct, 3
Unified Shader Programming in C++
In real-time graphics, the strict separation of programming languages and environments for host (CPU) code and GPU code results in code duplication, subtle compatibility bugs, and additional development and maintenance costs. In contrast, popular general-purpose GPU (GPGPU) programming models like CUDA and C++ AMP avoid many of these issues by presenting unified programming environments where […]
Oct, 3
Intel oneAPI DPC++ FPGA Optimization Guide
The Intel® oneAPI FPGA Optimization Guide provides guidance on leveraging the functionalities of Data Parallel C++ (DPC++) to optimize your design. This document assumes that you are familiar with SYCL* concepts and application programming interfaces (APIs), as described in the SYCL* Specification version 1.2.1 by the Khronos* Group. It also assumes that you have experience […]
Oct, 3
Embedded Software Synthesis using Heterogeneous Dataflow Models
Dataflow process networks (DPNs) consist of statically defined process nodes with First-In-First-Out (FIFO) buffered point-to-point connections. DPNs are intrinsically data-driven, i.e., node actions are not synchronized among each other and may fire whenever sufficient input operands arrived at a node. In this original form, DPNs are data-driven and therefore a suitable model of computation (MoC) […]