Lars Hunger, Biagio Cosenza, Stefan Kimeswenger, Thomas Fahringer
Random Field (RF) generation algorithms are of paramount importance for many scientific domains, such as astrophysics, geostatistics, computer graphics and many others. Some examples are the generation of initial conditions for cosmological simulations or hydrodynamical turbulence driving. In the latter a new random field is needed every time-step. Current approaches commonly make use of 3D […]
View View   Download Download (PDF)   
Ru Zhu
A finite-difference Micromagnetic solver is presented utilizing the C++ Accelerated Massive Parallelism (C++ AMP). The high speed performance of a single Graphics Processing Unit (GPU) is demonstrated compared to a typical CPU-based solver. The speed-up of GPU to CPU is shown to be greater than 100 for problems with larger sizes. This solver is based […]
View View   Download Download (PDF)   
Q. Lu, J. Amundson
Synergia is a parallel, 3-dimensional space-charge particle-in-cell accelerator modeling code. We present our work porting the purely MPI-based version of the code to a hybrid of CPU and GPU computing kernels. The hybrid code uses the CUDA platform in the same framework as the pure MPI solution. We have implemented a lock-free collaborative charge-deposition algorithm […]
Sangeeta Bhattacharjee, Satyendra Singh Yadav, Sarat Kumar Patra
In recent years Graphics Processing Unit (GPU) has evolved as a high performance data processing technology allowing users to compute large blocks of parallel data using an array of low complexity processors. This paper proposes the implementation of compute intensive portions of 3rd Generation Partnership Project (3GPP) Long Term Evolution (LTE) physical layer using GPU. […]
View View   Download Download (PDF)   
B. Pandya, N. Gajjar
The increased demand for higher resolution and detailed SAR imaging builds up a pressure on the processing power of the existing systems for real time or near real time processing. Exploitation of GPU processing power could suffice the increasing demands in processing. The processing of initial SAR systems was based on the principles of Fourier […]
View View   Download Download (PDF)   
L. L. Pilla, P. Rech, F. Silvestri, C. Frost, P. O. A. Navaux, M. Sonza Reorda, L. Carro
In this paper we assess the neutron sensitivity of Graphics Processing Units (GPUs) when executing a Fast Fourier Transform (FFT) algorithm, and propose specific software-based hardening strategies to reduce its failure rate. Our research is motivated by experimental results with an unhardened FFT that demonstrate a majority of multiple errors in the output in the […]
View View   Download Download (PDF)   
Mohamed Amine Bergach, Serge Tissot, Michael Syska, Robert De Simone
Recent Intel processors (IvyBridge, Haswell) contain an embedded on-chip GPU unit, in addition to the main CPU processor. In this work we consider the issue of efficiently mapping Fast Fourier Transform computation onto such coprocessor units. To achieve this we pursue three goals: First, we want to study half-systematic ways to adjust the actual variant […]
View View   Download Download (PDF)   
Wei Wang
In the past decade, one of the major breakthroughs in computer science theory is the first construction of fully homomorphic encryption (FHE) scheme introduced by Gentry. Using a FHE one may perform an arbitrary numbers of computations directly on the encrypted data without revealing of the secret key. Therefore, a practical FHE provides an invaluable […]
View View   Download Download (PDF)   
Anders Eklund, Paul Dufort
We have presented solutions for fast non-separable floating point convolution in 2, 3 and 4 dimensions, using the CUDA programming language. We believe that these implementations will serve as a complement to the NPP library, which currently only supports 2D filters and images stored as integers. The shared memory implementation with loop unrolling is approximately […]
Benjamin Humphries, Hansen Zhang, Jiayi Sheng, Raphael Landaverde, Martin C. Herbordt
The 3D FFT is critical in many physical simulations and image processing applications. On FPGAs, however, the 3D FFT was thought to be inefficient relative to other methods such as convolution-based implementations of multigrid. We find the opposite: a simple design, operating at a conservative frequency, takes 4ms for 16^3, 21ms for 32^3, and 215ms […]
View View   Download Download (PDF)   
Lihui Zhang, Cuiying Guo, Yong Tang, Mengya Lv, Ying Li, Jing Zhao
Using tiled small patches construct deep ocean surface, which greatly reduce the calculation, is a very common approach in ocean simulation. But it leads foams on surface to obviously repetitive artifacts. We proposed a dynamic threshold condition based on global coordinates that control and reduce the periodic repetition. Using FFT method generate a small patch […]
View View   Download Download (PDF)   
Carlo del Mundo, Wu-chun Feng
The fast Fourier transform (FFT), a spectral method that computes the discrete Fourier transform and its inverse, pervades many applications in digital signal processing, such as imaging, tomography, and software-defined radio. Its importance has caused the research community to expend significant resources to accelerate the FFT, of which FFTW is the most prominent example. With […]
View View   Download Download (PDF)   
Page 1 of 1512345...10...Last »

* * *

* * *

Like us on Facebook

HGPU group

129 people like HGPU on Facebook

Follow us on Twitter

HGPU group

1190 peoples are following HGPU @twitter

* * *

Free GPU computing nodes at hgpu.org

Registered users can now run their OpenCL application at hgpu.org. We provide 1 minute of computer time per each run on two nodes with two AMD and one nVidia graphics processing units, correspondingly. There are no restrictions on the number of starts.

The platforms are

Node 1
  • GPU device 0: AMD/ATI Radeon HD 5870 2GB, 850MHz
  • GPU device 1: AMD/ATI Radeon HD 6970 2GB, 880MHz
  • CPU: AMD Phenom II X6 @ 2.8GHz 1055T
  • RAM: 12GB
  • OS: OpenSUSE 13.1
  • SDK: AMD APP SDK 2.9
Node 2
  • GPU device 0: AMD/ATI Radeon HD 7970 3GB, 1000MHz
  • GPU device 1: nVidia GeForce GTX 560 Ti 2GB, 822MHz
  • CPU: Intel Core i7-2600 @ 3.4GHz
  • RAM: 16GB
  • OS: OpenSUSE 12.2
  • SDK: nVidia CUDA Toolkit 6.0.1, AMD APP SDK 2.9

Completed OpenCL project should be uploaded via User dashboard (see instructions and example there), compilation and execution terminal output logs will be provided to the user.

The information send to hgpu.org will be treated according to our Privacy Policy

HGPU group © 2010-2014 hgpu.org

All rights belong to the respective authors

Contact us: