19041

Posts

Aug, 18

SODECL: An Open Source Library for Calculating Multiple Orbits of a System of Stochastic Differential Equations in Parallel

Stochastic differential equations (SDEs) are widely used to model systems affected by random processes. In general, the analysis of an SDE model requires numerical solutions to be generated many times over multiple parameter combinations. However, this process often requires considerable computational resources to be practicable. Due to the embarrassingly parallel nature of the task, devices […]
Aug, 11

Simple Iterative Incompressible Smoothed Particle Hydrodynamics

In this paper a simple, robust, and general purpose approach to implement the Incompressible Smoothed Particle Hydrodynamics (ISPH) method is proposed. The new approach is well suited for implementation on CPUs and GPUs. The method is matrix-free and uses an iterative formulation to setup and solve the pressure Poisson equation. A novel approach is used […]
Aug, 11

A Deep Learning Approach for Automatic Code Optimization in the Tiramisu Compiler

Modern compilers offer more and more code optimization possibilities. This enables better use of sophisticated hardware architectures and available resources in order to accelerate programs. It is difficult to predict which optimizations will be beneficial for a given program, as it depends on the program, the execution environment, interaction with other optimizations, and other factors. […]
Aug, 11

Live Migration of FPGA Applications

With the recent and growing trend of Field Programmable Gate Arrays (FPGAs) being deployed into the data centers, cloud computing service providers are finding it difficult to manage these devices efficiently because traditional server management concepts and techniques are not yet available for FPGAs. In this thesis, we explore how to bring one of these […]
Aug, 11

Performance Comparison for Neuroscience Application Benchmarks

Researchers within the Human Brain Project and related projects have in the last couple of years expanded their needs for high-performance computing infrastructures. The needs arise from a diverse set of science challenges that range from large-scale simulations of brain models to processing of extreme-scale experimental data sets. The ICEI project, which is in the […]
Aug, 11

GraphBLAST: A High-Performance Linear Algebra-based Graph Framework on the GPU

High-performance implementations of graph algorithms are challenging to implement on new parallel hardware such as GPUs, because of three challenges: (1) difficulty of coming up with graph building blocks, (2) load imbalance on parallel hardware, and (3) graph problems having low arithmetic ratio. To address these challenges, GraphBLAS is an innovative, on-going effort by the […]
Aug, 5

Parallelization of Coherent Point Drift for patient registration

Point set registration is a central part in any application where the correspondence between two data point sets is of interest, for instance patient data from medical examinations. There exists numerous different algorithms that aim at solving the registration problem, and one of which is the Coherent Point Drift (CPD) algorithm. In this thesis a […]
Aug, 5

Mapping a Guided Image Filter on the HARP Reconfigurable Architecture Using OpenCL

Intel recently introduced the Heterogeneous Architecture Research Platform, HARP. In this platform, the Central Processing Unit and a Field-Programmable Gate Array are connected through a high-bandwidth, low-latency interconnect and both share DRAM memory. For this platform, Open Computing Language (OpenCL), a High-Level Synthesis (HLS) language, is made available. By making use of HLS, a faster […]
Aug, 5

Incremental Bounded Model Checking of Artificial Neural Networks in CUDA

Artificial Neural networks (ANNs) are powerful computing systems employed for various applications due to their versatility to generalize and to respond to unexpected inputs/patterns. However, implementations of ANNs for safety-critical systems might lead to failures, which are hardly predicted in the design phase since ANNs are highly parallel and their parameters are hardly interpretable. Here […]
Aug, 5

A Survey of Convolutional Neural Networks on Edge with Reconfigurable Computing

The convolutional neural network (CNN) is one of the most used deep learning models for image detection and classification, due to its high accuracy when compared to other machine learning algorithms. CNNs achieve better results at the cost of higher computing and memory requirements. Inference of convolutional neural networks is therefore usually done in centralized […]
Aug, 5

GLU3.0: Fast GPU-based Parallel Sparse LU Factorization for Circuit Simulation

In this article, we propose a new GPU-based sparse LU factorization method, called GLU3.0, solves the aforementioned problems. First, it introduces a much more efficient double-U dependency detection algorithm to make the detection much simpler. Second, we observe that the potential parallelism is different as the matrix factorization goes on. We then develop three different […]
Jul, 28

FBLAS: Streaming Linear Algebra on FPGA

Energy efficiency is one of the primary concerns when designing large scale computing systems. This makes reconfigurable hardware an attractive alternative to load-store architectures, as it allows eliminating expensive control and data movement overheads in computations. In practice, these devices are often not considered in the high-performance computing community, due to the steep learning curve […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: