Feb, 19

Faster File Matching using GPGPUs

We address the problem of file matching by modifying the MD6 algorithm that is best suited to take advantage of GPU computing. MD6 is a cryptographic hash function that is tree-based and highly parallelizable. When the message M is available initially, the hashing operations can be initiated at different starting points within the message and […]
Feb, 19

Efficiency Considerations of Cauchy Reed-Solomon Implementations on Accelerator and Multi-Core Platforms

The Cauchy variant of the Reed-Solomon algorithm is implemented on accelerator platforms including GPGPU, FPGA, CellBE and ClearSpeed as well as on a x86 multi-core system. The sustained throughput performance and kernel rates are measured for a 5+3 Reed-Solomon schema. To compare the different technology platforms an efficiency is introduced and the platforms are categorized […]
Feb, 19

Using GPU VSIPL & CUDA to Accelerate RF Clutter Simulation

This paper describes a flexible simulator for background Radio Frequency clutter developed at the Georgia Tech Research Institute, and how this simulation was accelerated with the use of nVidia GPUs using GPU VSIPL. The paper describes the mathematical basis for the simulation and how it can be used to simulate RF environments and scenarios; introduces […]
Feb, 18

Accelerating Image Feature Comparisons using CUDA on Commodity Hardware

Given multiple images of the same scene, image registration is the process of determining the correct transformation to bring the images into a common coordinate system-i.e., how the images fit together. Featurebased registration applies a transformation function to the input images before performing the correlation step. The result of that transformation, also called feature extraction, […]
Feb, 18

Tetrahedral Interpolation for Deformable Image Registration on GPUs

We speed up the tetrahedral interpolation step of a deformable image registration code called MORFEUS. We implement several versions of the interpolation code on a Fermi GPU (GTX480). Despite the irregularity of the code, we obtained kernel speedups of up to 24.6x, 33.7x and 62.4x on three real-life benchmarks. These numbers do not include the […]
Feb, 18

Optimization of HEP codes on GPUs

The graphics processor units (GPUs) have evolved into high-performance co-processors that can be easily programmed with common high-level language such as C, Fortran and C++. Today’s GPUs greatly outpace CPUs in arithmetic performance and memory bandwidth, making them the ideal coprocessor to accelerate a variety of data parallel applications. Here, we shall describe the application […]
Feb, 18

Power-aware Performance of Mixed Precision Linear Solvers for FPGAs and GPGPUs

Power has emerged as a significant constraint to high performance systems. We propose modeling power-based performance (performance/watt) and clock-based performance for GPGPUs and FPGAs. Based on the modeling, we perform a case-study with mixed precision linear solvers for a Xilinx XC5VLX330T FPGA and NVIDIA Tesla C1060 GPU. In the case-study, the FPGA shows power- and […]
Feb, 18

Accelerating Double Precision Floating-point Hessenberg Reduction on FPGA and Multicore Architectures

Double precision floating-point performance is critical for hardware acceleration technologies to be adopted by domain scientists. In this work we use the Hessenberg reduction to demonstrate the potential of FPGAs and GPUs for obtaining satisfactory double precision floating-point performance. Currently a Xeon (Nehalem) 2.26 GHz CPU can outperform Xilinx Virtex4LX200 by 3.6 folds. However, given […]
Feb, 18

GPU Acceleration of Near-Minimal Logic Minimization

In this paper, we describe a GPU-accelerated implementation of a logic minimization heuristic based on the near minimal approach. This algorithm has three key kernel computations, and the current version of our implementation, we adapted one of these kernels for GPU execution. In this paper we report our results gained from using NVIDIA’s CUDA development […]
Feb, 18

Fully accelerating quantum Monte Carlo simulations of real materials on GPU clusters

Continuum quantum Monte Carlo (QMC) has proved to be an invaluable tool for predicting the properties of matter from fundamental principles. By solving the manybody Schrodinger equation through a stochastic projection, it achieves greater accuracy than mean-field methods and better scalability than quantum chemical methods, enabling scientific discovery across a broad spectrum of disciplines. The […]
Feb, 18

Sparse systems solving on GPUs with GMRES

Scientific applications very often rely on solving one or more linear systems. When matrices are sparse, iterative methods are preferred to direct ones. Nevertheless, the value of nonzero elements and their distribution (i.e., the sketch of the matrix) greatly influence the efficiency of those methods (in terms of computation time, number of iterations, result precision) […]
Feb, 18

Accelerating Power Flow studies on Graphics Processing Unit

This paper presents the design of Power Flow algorithm that has enhanced performance on the Graphics Processing Unit (GPU) using Compute Unified Device Architecture (CUDA). This work investigates the performance of optimized CPU versions of Newton-Raphson (Polar form) and Gauss-Jacobi power flow algorithms, highlights the approach used to reduce the computation time by performing these […]
Page 619 of 763« First...102030...617618619620621...630640650...Last »

* * *

* * *

Like us on Facebook

HGPU group

169 people like HGPU on Facebook

Follow us on Twitter

HGPU group

1276 peoples are following HGPU @twitter

* * *

Free GPU computing nodes at hgpu.org

Registered users can now run their OpenCL application at hgpu.org. We provide 1 minute of computer time per each run on two nodes with two AMD and one nVidia graphics processing units, correspondingly. There are no restrictions on the number of starts.

The platforms are

Node 1
  • GPU device 0: AMD/ATI Radeon HD 5870 2GB, 850MHz
  • GPU device 1: AMD/ATI Radeon HD 6970 2GB, 880MHz
  • CPU: AMD Phenom II X6 @ 2.8GHz 1055T
  • RAM: 12GB
  • OS: OpenSUSE 13.1
  • SDK: AMD APP SDK 2.9
Node 2
  • GPU device 0: AMD/ATI Radeon HD 7970 3GB, 1000MHz
  • GPU device 1: nVidia GeForce GTX 560 Ti 2GB, 822MHz
  • CPU: Intel Core i7-2600 @ 3.4GHz
  • RAM: 16GB
  • OS: OpenSUSE 12.2
  • SDK: nVidia CUDA Toolkit 6.0.1, AMD APP SDK 2.9

Completed OpenCL project should be uploaded via User dashboard (see instructions and example there), compilation and execution terminal output logs will be provided to the user.

The information send to hgpu.org will be treated according to our Privacy Policy

HGPU group © 2010-2014 hgpu.org

All rights belong to the respective authors

Contact us: