17912

Posts

Jan, 6

GPU Acceleration of a High-Order Discontinuous Galerkin Incompressible Flow Solver

We present a GPU-accelerated version of a high-order discontinuous Galerkin discretization of the unsteady incompressible Navier-Stokes equations. The equations are discretized in time using a semi-implicit scheme with explicit treatment of the nonlinear term and implicit treatment of the split Stokes operators. The pressure system is solved with a conjugate gradient method together with a […]
Jan, 6

Rubus: A compiler for seamless and extensible parallelism

Nowadays, a typical processor may have multiple processing cores on a single chip. Furthermore, a special purpose processing unit called Graphic Processing Unit (GPU), originally designed for 2D/3D games, is now available for general purpose use in computers and mobile devices. However, the traditional programming languages which were designed to work with machines having single […]
Jan, 6

ThunderSVM: A Fast SVM Library on GPUs and CPUs

Support Vector Machines (SVMs) are classic supervised learning models for classification, regression and distribution estimation. A survey conducted by Kaggle in 2017 shows that 26% of the data mining and machine learning practitioners are users of SVMs. However, SVM training and prediction are very expensive computationally for large and complex problems. This paper presents an […]
Jan, 6

Scaling GRPC Tensorflow on 512 nodes of Cori Supercomputer

We explore scaling of the standard distributed Tensorflow with GRPC primitives on up to 512 Intel Xeon Phi (KNL) nodes of Cori supercomputer with synchronous stochastic gradient descent (SGD), and identify causes of scaling inefficiency at higher node counts. To our knowledge, this is the first exploration of distributed GRPC Tensorflow scalability on a HPC […]
Jan, 6

Analysing the Performance of GPU Hash Tables for State Space Exploration

In the past few years, General Purpose Graphics Processors (GPUs) have been used to significantly speed up numerous applications. One of the areas in which GPUs have recently led to a significant speed-up is model checking. In model checking, state spaces, i.e., large directed graphs, are explored to verify whether models satisfy desirable properties. GPUexplore […]
Dec, 28

Using reconfigurable computing technology to accelerate matrix decomposition and applications

Matrix decomposition plays an increasingly significant role in many scientific and engineering applications. Among numerous techniques, Singular Value Decomposition (SVD) and Eigenvalue Decomposition (EVD) are widely used as factorization tools to perform Principal Component Analysis for dimensionality reduction and pattern recognition in image processing, text mining and wireless communications, while QR Decomposition (QRD) and sparse […]
Dec, 28

Design and Implementation of the Futhark Programming Language

In this thesis we describe the design and implementation of Futhark, a small data-parallel purely functional array language that offers a machine-neutral programming model, and an optimising compiler that generates efficient OpenCL code for GPUs. The overall philosophy is based on seeking a middle ground between functional and imperative approaches. The specific contributions are as […]
Dec, 28

A Generic Inverted Index Framework for Similarity Search on the GPU

We propose a novel generic inverted index framework on the GPU (called GENIE), aiming to reduce the programming complexity of the GPU for parallel similarity search of different data types. Not every data type and similarity measure are supported by GENIE, but many popular ones are. We present the system design of GENIE, and demonstrate […]
Dec, 28

A Survey of FPGA Based Neural Network Accelerator

Recent researches on neural network have shown great advantage in computer vision over traditional algorithms based on handcrafted features and models. Neural network is now widely adopted in regions like image, speech and video recognition. But the great computation and storage complexity of neural network based algorithms poses great difficulty on its application. CPU platforms […]
Dec, 28

Protecting Real-Time GPU Applications on Integrated CPU-GPU SoC Platforms

Integrated CPU-GPU architecture provides excellent acceleration capabilities for data parallel applications on embedded platforms while meeting the size, weight and power (SWaP) requirements. However, sharing of main memory between CPU applications and GPU kernels can severely affect the execution of GPU kernels and diminish the performance gain provided by GPU. For example, in the NVIDIA […]
Dec, 24

Pass a Pointer: Exploring Shared Virtual Memory Abstractions in OpenCL Tools for FPGAs

Heterogeneous CPU-FPGA systems are gaining momentum in the embedded systems sector and in the data center market. While the programming abstractions for implementing the data transfer between CPU and FPGA (and vice versa) that are available in today’s commercial programming tools are well-suited for certain types of applications, the CPU-FPGA communication for applications that share […]
Dec, 24

Extending OmpSs for OpenCL kernel co-execution in heterogeneous systems

Heterogeneous systems have a very high potential performance but present difficulties in their programming. OmpSs is a well known framework for task based parallel applications, which is an interesting tool to simplify the programming of these systems. However, it does not support the co-execution of a single OpenCL kernel instance on several compute devices. To […]
Page 9 of 947« First...7891011...203040...Last »

* * *

* * *

Featured events

HGPU group © 2010-2018 hgpu.org

All rights belong to the respective authors

Contact us: