15180
Gregorio Bernabe
We present in this paper several implementations of the 3D Fast Wavelet Transform (3D-FWT) on multicore CPUs and manycore GPUs. On the GPU side, we focus on CUDA and OpenCL programming to develop methods for an efficient mapping on manycores. On multicore CPUs, OpenMP and Pthreads are used as counterparts to maximize parallelism, and renowned […]
View View   Download Download (PDF)   
Olav Emil Eiksund
A Particle-In-Cell code is a common particle simulation method often used to simulate the behaviour of plasma. In this work, a parallel PIC code is developed in CUDA, with a focus on how to adapt the method for multiple GPUs. An electrostatic three dimensional PIC code is developed, with an FFT-based solver using the cuFFT […]
View View   Download Download (PDF)   
Aleksandar Zlateski, Kisuk Lee, H. Sebastian Seung
Convolutional networks (ConvNets) have become a popular approach to computer vision. It is important to accelerate ConvNet training, which is computationally costly. We propose a novel parallel algorithm based on decomposition into a set of tasks, most of which are convolutions or FFTs. Applying Brent’s theorem to the task dependency graph implies that linear speedup […]
Lu-Hung Chen, Ci-Ren Jiang
Functional principal component analysis is one the most commonly employed approaches in functional/longitudinal data analysis and we extend it to conduct $d$-dimensional functional/longitudinal data analysis. The computational issues emerging in the extension are fully addressed with our proposed solutions. The local linear smoothing technique is employed to perform estimation because of its capabilities of performing […]
View View   Download Download (PDF)   
Andrew Lavin
We derive a new class of fast algorithms for convolutional neural networks using Winograd’s minimal filtering algorithms. Specifically we derive algorithms for network layers with 3×3 kernels, which are the preferred kernel size for image recognition tasks. The best of our algorithms reduces arithmetic complexity up to 4X compared with direct convolution, while using small […]
View View   Download Download (PDF)   
Andreas Adelmann, Uldis Locans, Andreas Suter
Emerging processor architectures such as GPUs and Intel MICs provide a huge performance potential for high performance computing. However developing software using these hardware accelerators introduces additional challenges for the developer such as exposing additional parallelism, dealing with different hardware designs and using multiple development frameworks in order to use devices from different vendors. The […]
View View   Download Download (PDF)   
Amir Gholami, Judith Hill, Dhairya Malhotra, George Biros
We present a new library for parallel distributed Fast Fourier Transforms (FFT). Despite the large amount of work on FFTs, we show that significant speedups can be achieved for distributed transforms. The importance of FFT in science and engineering and the advances in high performance computing necessitate further improvements. AccFFT extends existing FFT libraries for […]
Fredrik Andersson, Marcus Carlsson, Viktor V. Nikitin
The Radon transform and its adjoint, the back-projection operator, can both be expressed as convolutions in log-polar coordinates. Hence, fast algorithms for the application of the operators can be constructed by using FFT, if data is resampled at log-polar coordinates. Radon data is typically measured on an equally spaced grid in polar coordinates, and reconstructions […]
View View   Download Download (PDF)   
Feifei Shen, Zhenjian Song, Congrui Wu, Jiaqi Geng, Qingyun Wang
Study of general purpose computation by GPU (Graphics Processing Unit) can improve the image processing capability of micro-computer system. This paper studies the parallelism of the different stages of decimation in time radix 2 FFT algorithm, designs the butterfly and scramble kernels and implements 2D FFT on GPU. The experiment result demonstrates the validity and […]
View View   Download Download (PDF)   
Mohamed Amine Bergach, Emilien Kofman, Robert de Simone, Serge Tissot, Michel Syska
General-purpose multiprocessors (as, in our case, Intel IvyBridge and Intel Haswell) increasingly add GPU computing power to the former multicore architectures. When used for embedded applications (for us, Synthetic aperture radar) with intensive signal processing requirements, they must constantly compute convolution algorithms, such as the famous Fast Fourier Transform. Due to its "fractal" nature (the […]
View View   Download Download (PDF)   
Amir Gholami, Judith Hill, Dhairya Malhotra, George Biros
We present a new library for scalable 3-D Fast Fourier Transforms (FFT). Despite the large amount of work on 3-D FFTs, we show that significant speedups can be achieved for large problem sizes and core counts. The importance of FFT in science and engineering and the advances in high performance computing necessitate further improvements in […]
Nicolas Vasilache, Jeff Johnson, Michael Mathieu, Soumith Chintala, Serkan Piantino, Yann LeCun
We examine the performance profile of Convolutional Neural Network training on the current generation of NVIDIA Graphics Processing Units. We introduce two new Fast Fourier Transform convolution implementations: one based on NVIDIA’s cuFFT library, and another based on a Facebook authored FFT implementation, fbfft, that provides significant speedups over cuFFT (over 1.5x) for whole CNNs. […]
Page 1 of 1712345...10...Last »

* * *

* * *

Follow us on Twitter

HGPU group

1745 peoples are following HGPU @twitter

Like us on Facebook

HGPU group

371 people like HGPU on Facebook

HGPU group © 2010-2016 hgpu.org

All rights belong to the respective authors

Contact us: