12956

Posts

Oct, 20

Fast-Fourier-Transform-Based Electrical Noise Measurements

We have shown how the Fourier spectrum and the power spectral density can be estimated in concrete measurements. Moreover, we have derived spectral leakage, which is a systematic error in spectrum computation. The Nyquist-Shannon sampling theorem and aliasing have been discussed. Furthermore, we have implemented a spectrum analyzer using a combination of LabView, GPU computing […]
Oct, 20

High-Dimensional Adaptive Particle Swarm Optimization on Heterogeneous Systems

Much work has recently been reported in parallel GPU-based particle swarm optimization (PSO). Motivated by the encouraging results of these investigations, while also recognizing the limitations of GPU-based methods for big problems using a large amount of data, this paper explores the efficacy of employing other types of parallel hardware for PSO. Most commodity systems […]
Oct, 18

A Review of CUDA, MapReduce, and Pthreads Parallel Computing Models

The advent of high performance computing (HPC) and graphics processing units (GPU), present an enormous computation resource for Large data transactions (big data) that require parallel processing for robust and prompt data analysis. While a number of HPC frameworks have been proposed, parallel programming models present a number of challenges, for instance, how to fully […]
Oct, 18

StreamWorks: An Energy-efficient Embedded Co-processor for Stream Computing

Stream processing has emerged as an important model of computation especially in the context of multimedia and communication sub-systems of embedded System-on-Chip (SoC) architectures. The dataflow nature of streaming applications allows them to be most naturally expressed as a set of kernels iteratively operating on continuous streams of data. The kernels are computationally intensive and […]
Oct, 18

Hybrid CPU-GPU Implementation of Tracking-Learning-Detection Algorithm

Tracking objects in a video stream is an important problem in robot learning (learning an object’s visual features from different perspectives as it moves, rotates, scales, and is subjected to some morphological changes such as erosion), defense, public security and many other various domains. In this thesis, we focus on a recently proposed tracking framework […]
Oct, 18

Cholla : A New Massively-Parallel Hydrodynamics Code For Astrophysical Simulation

We present Cholla (Computational Hydrodynamics On ParaLLel Architectures), a new three-dimensional hydrodynamics code that harnesses the power of graphics processing units (GPUs) to accelerate astrophysical simulations. Cholla models the Euler equations on a static mesh using state-of-the-art techniques, including the unsplit Corner Transport Upwind (CTU) algorithm, a variety of exact and approximate Riemann solvers, and […]
Oct, 18

Constructing Long Short-Term Memory based Deep Recurrent Neural Networks for Large Vocabulary Speech Recognition

Long short-term memory (LSTM) based acoustic modeling methods have recently been shown to give state-of-the-art performance on some speech recognition tasks. To achieve a further performance improvement, in this research, deep extensions on LSTM are investigated considering that deep hierarchical model has turned out to be more efficient than a shallow one. Motivated by previous […]
Oct, 16

The Distribution of OpenCL Kernel Execution Across Multiple Devices

Many computer systems now include both CPUs and programmable GPUs. OpenCL, a new programming framework, can program individual CPUs or GPUs; however, distributing a problem across multiple devices is more difficult. This thesis contributes three OpenCL runtimes that automatically distribute a problem across multiple devices: DualCL and m2sOpenCL, which distribute tasks across a single system’s […]
Oct, 16

OpenCL Implementation of Montgomery Multiplication on FPGA

Galois Field arithmetic has been used very frequently in popular security and error-correction applications. Montgomery multiplication is among the suitable methods used for accelerating modular multiplication, which is the most time consuming basic arithmetic operation. Montgomery multiplication is also suitable to be implemented in parallel. OpenCL, which is a portable, heterogeneous and parallel programming framework, […]
Oct, 16

Parallel Programming and Compressed Material Data for an Eulerian Code

We describe the problem of iterating over mesh zones and iterating over material data within a zone, in the context of relatively new compute architectures. We present an example for how this can be done in a way that is portable across parallel programming environments and can be made to perform well. We offer a […]
Oct, 16

Multi-GPU Based Lattice Boltzmann Method for Hemodynamic Simulation in Patient-Specific Cerebral Aneurysm

Conducting lattice Boltzmann method on GPU has been proved to be an effective manner to gain a significant performance benefit, thus the GPU or multi-GPU based lattice Boltzmann method is considered as a promising and competent candidate in the study of large-scale complex fluid flows. In this work, a multi-GPU based lattice Boltzmann algorithm coupled […]
Oct, 16

Pipelined Iterative Solvers with Kernel Fusion for Graphics Processing Units

We revisit the implementation of iterative solvers on discrete graphics processing units and demonstrate the benefit of implementations using extensive kernel fusion for pipelined formulations over conventional implementations of classical formulations. The proposed implementations with both CUDA and OpenCL are freely available in ViennaCL and achieve up to three-fold performance gains when compared to other […]
Page 4 of 764« First...23456...102030...Last »

* * *

* * *

Like us on Facebook

HGPU group

172 people like HGPU on Facebook

Follow us on Twitter

HGPU group

1283 peoples are following HGPU @twitter

* * *

Free GPU computing nodes at hgpu.org

Registered users can now run their OpenCL application at hgpu.org. We provide 1 minute of computer time per each run on two nodes with two AMD and one nVidia graphics processing units, correspondingly. There are no restrictions on the number of starts.

The platforms are

Node 1
  • GPU device 0: AMD/ATI Radeon HD 5870 2GB, 850MHz
  • GPU device 1: AMD/ATI Radeon HD 6970 2GB, 880MHz
  • CPU: AMD Phenom II X6 @ 2.8GHz 1055T
  • RAM: 12GB
  • OS: OpenSUSE 13.1
  • SDK: AMD APP SDK 2.9
Node 2
  • GPU device 0: AMD/ATI Radeon HD 7970 3GB, 1000MHz
  • GPU device 1: nVidia GeForce GTX 560 Ti 2GB, 822MHz
  • CPU: Intel Core i7-2600 @ 3.4GHz
  • RAM: 16GB
  • OS: OpenSUSE 12.2
  • SDK: nVidia CUDA Toolkit 6.0.1, AMD APP SDK 2.9

Completed OpenCL project should be uploaded via User dashboard (see instructions and example there), compilation and execution terminal output logs will be provided to the user.

The information send to hgpu.org will be treated according to our Privacy Policy

HGPU group © 2010-2014 hgpu.org

All rights belong to the respective authors

Contact us: