12970

Posts

Oct, 24

cufftShift: High Performance CUDA-accelerated FFT-shift Library

For embarrassingly parallel algorithms, a Graphics Processing Unit (GPU) outperforms a traditional CPU on price-per-flop and price-per-watt by at least one order of magnitude. This had led to the mapping of signal and image processing algorithms, and consequently their applications, to run entirely on GPUs. This paper presents CUFFTSHIFT, a ready-to-use GPU-accelerated library, that implements […]
Oct, 24

Query Optimization in Heterogeneous CPU/GPU Environment for Time Series Databases

In recent years, processing and exploration of time series has experienced a noticeable interest. Growing volumes of data and needs of efficient processing pushed the research in new directions, including hardware based solutions. Graphics Processing Units (GPU) have significantly more applications than just rendering images. They are also used in general purpose computing to solve […]
Oct, 24

Gaussian Process Models with Parallelization and GPU acceleration

In this work, we present an extension of Gaussian process (GP) models with sophisticated parallelization and GPU acceleration. The parallelization scheme arises naturally from the modular computational structure w.r.t. datapoints in the sparse Gaussian process formulation. Additionally, the computational bottleneck is implemented with GPU acceleration for further speed up. Combining both techniques allows applying Gaussian […]
Oct, 24

Monitoring Large-scale Microblog on GPUs

To monitor bad information spreading in microblog system, large-scale data from microblog must be processed in real time. This needs high cost-effective parallel schemes. A parallel processing method on GPUs was put forward to monitor massive microblog. The proposed scheme can fully exploit the GPU feature to schedule massive threads for data-intensive tasks. The detailed […]
Oct, 24

Improved Integral Histogram Algorithm for Big Sized Images in CUDA Environment

Although integral histogram enables histogram computation of a sub-area within constant time, construction of the integral histogram requires O(nm) steps for n x m sized image. Such construction time can be reduced using parallel prefix sum algorithm. Mark Harris proposed an efficient parallel prefix sum and implemented it using CUDA GPGPU. Mark Harris’ algorithm has […]
Oct, 22

Introducing CURRENNT – the Munich open-source CUDA RecurREnt Neural Network Toolkit

In this article, we introduce CURRENNT, an open-source parallel implementation of deep recurrent neural networks (RNNs) supporting graphics processing units (GPUs) through NVIDIA’s Computed Unified Device Architecture (CUDA). CURRENNT supports uni- and bidirectional RNNs with Long Short-Term Memory (LSTM) memory cells which overcome the vanishing gradient problem. To our knowledge, CURRENNT is the first publicly […]
Oct, 22

Optimization Techniques for Mapping Algorithms and Applications onto CUDA GPU Platforms and CPU-GPU Heterogeneous Platforms

An emerging trend in processor architecture seems to indicate the doubling of the number of cores per chip every two years with same or decreased clock speed. Of particular interest to this thesis is the class of many-core processors, which are becoming more attractive due to their high performance, low cost, and low power consumption. […]
Oct, 22

Fast Parallel Algorithm for Enumerating All Chordless Cycles in Graphs

Finding chordless cycles is an important theoretical problem in the Graph Theory area. It also can be applied to practical problems such as discover which predators compete for the same food in ecological networks. Motivated by the problem of theoretical interest and also by its significant practical importance, we present in this paper a parallel […]
Oct, 22

3D simulation of complex shading affecting PV systems taking benefit from the power of graphics cards developed for the video game industry

Shading reduces the power output of a photovoltaic (PV) system. The design engineering of PV systems requires modeling and evaluating shading losses. Some PV systems are affected by complex shading scenes whose resulting PV energy losses are very difficult to evaluate with current modeling tools. Several specialized PV design and simulation software include the possibility […]
Oct, 22

Performance Engineering of the Kernel Polynomial Method on Large-Scale CPU-GPU Systems

The Kernel Polynomial Method (KPM) is a well-established scheme in quantum physics and quantum chemistry to determine the eigenvalue density and spectral properties of large sparse matrices. In this work we demonstrate the high optimization potential and feasibility of peta-scale heterogeneous CPU-GPU implementations of the KPM. At the node level we show that it is […]
Oct, 20

A Performance Comparison of Sort and Scan Libraries for GPUs

Sorting and scanning are two fundamental primitives for constructing highly parallel algorithms. A number of libraries now provide implementations of these primitives for GPUs, but there is relatively little information about the performance of these implementations. We benchmark seven libraries for 32-bit integer scan and sort, and sorting 32-bit values by 32-bit integer keys.We show […]
Oct, 20

Massively parallel read mapping on GPUs with the q-group index and PEANUT

We present the q-group index, a novel data structure for read mapping tailored towards graphics processing units (GPUs) with a small memory footprint and efficient parallel algorithms for querying and building. On top of the q-group index we introduce PEANUT, a highly parallel GPU-based read mapper. PEANUT provides the possibility to output both the best […]
Page 1 of 76212345...102030...Last »

* * *

* * *

Like us on Facebook

HGPU group

168 people like HGPU on Facebook

Follow us on Twitter

HGPU group

1275 peoples are following HGPU @twitter

* * *

Free GPU computing nodes at hgpu.org

Registered users can now run their OpenCL application at hgpu.org. We provide 1 minute of computer time per each run on two nodes with two AMD and one nVidia graphics processing units, correspondingly. There are no restrictions on the number of starts.

The platforms are

Node 1
  • GPU device 0: AMD/ATI Radeon HD 5870 2GB, 850MHz
  • GPU device 1: AMD/ATI Radeon HD 6970 2GB, 880MHz
  • CPU: AMD Phenom II X6 @ 2.8GHz 1055T
  • RAM: 12GB
  • OS: OpenSUSE 13.1
  • SDK: AMD APP SDK 2.9
Node 2
  • GPU device 0: AMD/ATI Radeon HD 7970 3GB, 1000MHz
  • GPU device 1: nVidia GeForce GTX 560 Ti 2GB, 822MHz
  • CPU: Intel Core i7-2600 @ 3.4GHz
  • RAM: 16GB
  • OS: OpenSUSE 12.2
  • SDK: nVidia CUDA Toolkit 6.0.1, AMD APP SDK 2.9

Completed OpenCL project should be uploaded via User dashboard (see instructions and example there), compilation and execution terminal output logs will be provided to the user.

The information send to hgpu.org will be treated according to our Privacy Policy

HGPU group © 2010-2014 hgpu.org

All rights belong to the respective authors

Contact us: