17305

Posts

Jun, 25

ART vs. NDK vs. GPU acceleration: A study of performance of image processing algorithms on Android

The Android ecosystem contains three major platforms for execution suitable for different purposes. Android applications are normally written in the Java programming language, but computationally intensive parts of Android applications can be sped up by choosing to use a native language or by utilising the parallel architecture found in graphics processing units (GPUs). The experiments […]
Jun, 25

An Analysis of Variation Between Cores For Intel Xeon Phi Knights Corner And Xeon Phi Knights Landing

As we move towards exascale computing, the efficiency of application performance and energy utilization, must be optimized by redefining architectural features and application performance analysis. This research analyzes the performance per core of 8 applications on Intel Xeon Phi Knights Corner (KNC) and Knights Landing (KNL) to determine if performance variation within cores can lead […]
Jun, 25

High-Performance Out-of-core Block Randomized Singular Value Decomposition on GPU

Fast computation of singular value decomposition (SVD) is of great interest in various machine learning tasks. Recently, SVD methods based on randomized linear algebra have shown significant speedup in this regime. This paper attempts to further accelerate the computation by harnessing a modern computing architecture, namely graphics processing unit (GPU), with the goal of processing […]
Jun, 21

Multi-level Parallelism with MPI and OpenACC for CFD Applications

High-level parallel programming approaches, such as OpenACC, have recently become popular in complex fluid dynamics research since they are cross-platform and easy to implement. OpenACC is a directive-based programming model that, unlike low-level programming models, abstracts the details of implementation on the GPU. Although OpenACC generally limits the performance of the GPU, this model significantly […]
Jun, 21

Panda: A Compiler Framework for Concurrent CPU-GPU Execution of 3D Stencil Computations on GPU-accelerated Supercomputers

This paper describes a new compiler framework for heterogeneous 3D stencil computation on GPU clusters. Our framework consists of a simple directive-based programming model and a tightly integrated source-to-source compiler. Annotated with a small number of directives, sequential stencil codes originally written in C can be automatically parallelized for large-scale GPU clusters. The most distinctive […]
Jun, 21

On the Use of a GPU-Accelerated Mobile Device Processor for Sound Source Localization

The growing interest to incorporate new features into mobile devices has increased the number of signal processing applications running over processors designed for mobile computing. A challenging signal processing field is acoustic source localization, which is attractive for applications such as automatic camera steering systems, human-machine interfaces, video gaming or audio surveillance. In this context, […]
Jun, 21

Rgtsvm: Support Vector Machines on a GPU in R

Rgtsvm provides a fast and flexible support vector machine (SVM) implementation for the R language. The distinguishing feature of Rgtsvm is that support vector classification and support vector regression tasks are implemented on a graphical processing unit (GPU), allowing the libraries to scale to millions of examples with >100-fold improvement in performance over existing implementations. […]
Jun, 21

Kapre: On-GPU Audio Preprocessing Layers for a Quick Implementation of Deep Neural Network Models with Keras

We introduce Kapre, Keras layers for audio and music signal preprocessing. Music research using deep neural networks requires a heavy and tedious preprocessing stage, for which audio processing parameters are often ignored in parameter optimisation. To solve this problem, Kapre implements time-frequency conversions, normalisation, and data augmentation as Keras layers. We report simple benchmark results, […]
Jun, 17

Efficient OpenCL-based concurrent tasks offloading on accelerators

Current heterogeneous platforms with CPUs and accelerators have the ability to launch several independent tasks simultaneously, in order to exploit concurrency among them. These tasks typically consist of data transfer commands and kernel computation commands. In this paper we develop a runtime approach to optimize the concurrency between data transfers and kernel computation commands in […]
Jun, 17

Parallel Monte Carlo on Intel MIC Architecture

Trade-off between the cost-efficiency of powerful computational accelerators and the increasing energy needed to perform numerical tasks can be tackled by implementation of algorithms on the Intel Multiple Integrated Cores (MIC) architecture. The best performance of the algorithms requires the use of appropriate optimization and parallelization approaches throughout all process of their design. Monte Carlo […]
Jun, 17

Device Placement Optimization with Reinforcement Learning

The past few years have witnessed a growth in size and computational requirements for training and inference with neural networks. Currently, a common approach to address these requirements is to use a heterogeneous distributed environment with a mixture of hardware devices such as CPUs and GPUs. Importantly, the decision of placing parts of the neural […]
Jun, 17

Non-Hydrostatic Pressure Shallow Flows: GPU Implementation Using Finite-Volume and Finite-Difference Scheme

We consider the depth-integrated non-hydrostatic system derived by Yamazaki et al. An efficient formally second-order well-balanced hybrid finite volume/difference numerical scheme is proposed. The scheme consists in a two-step algorithm. First, the hyperbolic part of the system is discretized using a PVM path-conservative finite-volume method. Second, the dispersive terms are solved by means of compact […]
Page 32 of 952« First...1020...3031323334...405060...Last »

* * *

* * *

Featured events

2018
November
27-30
Hida Takayama, Japan

The Third International Workshop on GPU Computing and AI (GCA), 2018

2018
September
19-21
Nagoya University, Japan

The 5th International Conference on Power and Energy Systems Engineering (CPESE), 2018

2018
September
22-24
MediaCityUK, Salford Quays, Greater Manchester, England

The 10th International Conference on Information Management and Engineering (ICIME), 2018

2018
August
21-23
No. 1037, Luoyu Road, Hongshan District, Wuhan, China

The 4th International Conference on Control Science and Systems Engineering (ICCSSE), 2018

2018
October
29-31
Nanyang Executive Centre in Nanyang Technological University, Singapore

The 2018 International Conference on Cloud Computing and Internet of Things (CCIOT’18), 2018

HGPU group © 2010-2018 hgpu.org

All rights belong to the respective authors

Contact us: