14185
Amir Gholami, Judith Hill, Dhairya Malhotra, George Biros
We present a new library for parallel distributed Fast Fourier Transforms (FFT). Despite the large amount of work on FFTs, we show that significant speedups can be achieved for distributed transforms. The importance of FFT in science and engineering and the advances in high performance computing necessitate further improvements. AccFFT extends existing FFT libraries for […]
Imran Ashraf, Vlad-Mihai Sima, Koen Bertels
The growing demand of processing power is being satisfied mainly by an increase in the number of computing cores in a system. One of the main challenges to be addressed is efficient utilization of these architectures. This demands data-communication aware mapping of applications on these architectures. Appropriate tools are required to provide the detailed intra-application […]
View View   Download Download (PDF)   
Gaurav Chaurasia, Jonathan Ragan-Kelley, Sylvain Paris, George Drettakis, Fredo Durand
Infinite impulse response (IIR) or recursive filters, are essential for image processing because they turn expensive large-footprint convolutions into operations that have a constant cost per pixel regardless of kernel size. However, their recursive nature constrains the order in which pixels can be computed, severely limiting both parallelism within a filter and memory locality across […]
View View   Download Download (PDF)   
Liang-Tsung Huang, Chao-Chin Wu, Lien-Fu Lai, Yun-Ju Li
Sequence alignment lies at heart of the bioinformatics.The Smith-Waterman algorithm is one of the key sequence search algorithms and has gained popularity due to improved implementations and rapidly increasing compute power. Recently, the Smith-Waterman algorithm has been successfully mapped onto the emerging general-purpose graphics processing units (GPUs). In this paper, we focused on how to […]
View View   Download Download (PDF)   
Marijn F. Stollenga, Wonmin Byeon, Marcus Liwicki, Juergen Schmidhuber
Convolutional Neural Networks (CNNs) can be shifted across 2D images or 3D videos to segment them. They have a fixed input size and typically perceive only small local contexts of the pixels to be classified as foreground or background. In contrast, Multi-Dimensional Recurrent NNs (MD-RNNs) can perceive the entire spatio-temporal context of each pixel in […]
View View   Download Download (PDF)   
Oluwapelumi Adenikinju, Julian Gilyard, Joshua Massey, Thomas Stitt
We investigate the parallel solutions to linear systems with the application focus as the global illumination problem in computer graphics. An existing CPU serial implementation using the radiosity method is given as the performance baseline where a scene and corresponding form-factor coefficients are provided. The initial computational radiosity solver uses the basic Jacobi method with […]
View View   Download Download (PDF)   
Andra-Ecaterina Hugo
To face the ever demanding requirements in term of accuracy and speed of scientific simulations, the High Performance community is constantly increasing the demands in term of parallelism, adding thus tremendous value to parallel libraries strongly optimized for highly complex architectures.Enabling HPC applications to perform efficiently when invoking multiple parallel libraries simultaneously is a great […]
View View   Download Download (PDF)   
Maxwell Xu Cai, Yohai Meiron, M.B.N. Kouwenhoven, Paulina Assmann, Rainer Spurzem
Astrophysical research in recent decades has made significant progress thanks to the availability of various N-body simulation techniques. With the rapid development of high-performance computing technologies, modern simulations have been able to take the computing power of massively parallel clusters with more than 10^5 GPU cores. While unprecedented accuracy and dynamical scales have been achieved, […]
View View   Download Download (PDF)   
Gilbert Louis Bernstein, Chinmayee Shah, Crystal Lemire, Zachary DeVito, Matthew Fisher, Philip Levis, Pat Hanrahan
Designing programming environments for physical simulation is challenging because simulations rely on diverse algorithms and geometric domains. These challenges are compounded when we try to run efficiently on heterogeneous parallel architectures. We present Ebb, a domain-specific language (DSL) for simulation, that runs efficiently on both CPUs and GPUs. Unlike previous DSLs, Ebb uses a three-layer […]
View View   Download Download (PDF)   
S. Cuomo, A. Galletti, G. Giunta, L. Marcellino
In this work we present a multi-level parallel framework for the Optical Flow computation on a GPUs cluster, equipped with a scientific computing middleware (the PetSc library). Starting from a flow-driven isotropic method, which models the optical flow problem through a parabolic partial differential equation (PDE), we have designed a parallel algorithm and its software […]
View View   Download Download (PDF)   
Karthik Narayan, Ali Punjani, Pieter Abbeel
Although recent work in non-linear dimensionality reduction investigates multiple choices of divergence measure during optimization (Yang et al., 2013; Bunte et al., 2012), little work discusses the direct effects that divergence measures have on visualization. We study this relationship, theoretically and through an empirical analysis over 10 datasets. Our works shows how the alpha and […]
View View   Download Download (PDF)   
Taylor Berg-Kirkpatrick, Dan Klein
Voice conversion is the task of transforming a source speaker’s voice so that it sounds like a target speaker’s voice. We present a GPUfriendly local regression model for voice conversion that is capable of converting speech in real-time and achieves state-of-the-art accuracy on this task. Our model uses a new approximation for computing local regression […]
View View   Download Download (PDF)   
Page 1 of 50612345...102030...Last »

* * *

* * *

Follow us on Twitter

HGPU group

1493 peoples are following HGPU @twitter

Like us on Facebook

HGPU group

252 people like HGPU on Facebook

* * *

Free GPU computing nodes at hgpu.org

Registered users can now run their OpenCL application at hgpu.org. We provide 1 minute of computer time per each run on two nodes with two AMD and one nVidia graphics processing units, correspondingly. There are no restrictions on the number of starts.

The platforms are

Node 1
  • GPU device 0: nVidia GeForce GTX 560 Ti 2GB, 822MHz
  • GPU device 1: AMD/ATI Radeon HD 6970 2GB, 880MHz
  • CPU: AMD Phenom II X6 @ 2.8GHz 1055T
  • RAM: 12GB
  • OS: OpenSUSE 13.1
  • SDK: nVidia CUDA Toolkit 6.5.14, AMD APP SDK 3.0
Node 2
  • GPU device 0: AMD/ATI Radeon HD 7970 3GB, 1000MHz
  • GPU device 1: AMD/ATI Radeon HD 5870 2GB, 850MHz
  • CPU: Intel Core i7-2600 @ 3.4GHz
  • RAM: 16GB
  • OS: OpenSUSE 12.3
  • SDK: AMD APP SDK 3.0

Completed OpenCL project should be uploaded via User dashboard (see instructions and example there), compilation and execution terminal output logs will be provided to the user.

The information send to hgpu.org will be treated according to our Privacy Policy

HGPU group © 2010-2015 hgpu.org

All rights belong to the respective authors

Contact us: