Jun, 13

A GPU-Accelerated Two Stage Visual Matching

We propose a two stage visual matching pipeline including a first step using VLAD signatures for filtering results, and a second step which reranks the top results using raw matching of SIFT descriptors. This enables adjusting the tradeoff between high computational cost of matching local descriptors and the insufficient accuracy of compact signatures in many […]
Jun, 10

Practical Symbolic Execution Analysis and Methodology for GPU Programs

Graphics processing units (GPUs) are highly parallel processors that are now commonly used in the acceleration of a wide range of computationally intensive tasks. GPU programs often suffer from data races and deadlocks, necessitating systematic testing. Conventional GPU debuggers are ineffective at finding and root-causing races since they detect errors with respect to the specific […]
Jun, 10

Performance of a code migration for the simulation of supersonic ejector flow to SMP, MIC and GPU using OpenMP, OpenMP+LEO, and OpenACC directives

In this work, a serial source code for simulating a supersonic ejector flow is accelerated using parallelization based on OpenMP and OpenACC directives. The purpose is to reduce the development costs and to simplify the maintenance of the application due to the complexity of the FORTRAN source code. OpenMP has become the programming standard for […]
Jun, 10

A Pattern Specification and Optimizations Framework for Accelerating Scientific Computations on Heterogeneous Clusters

Clusters with accelerators at each node have emerged as the dominant high-end architecture in recent years. Such systems can be extremely hard to program because of the underlying heterogeneity and the need for exploiting parallelism at multiple levels. Thus, easing parallel programming today requires not only high-level programming models, but ones from which hybrid parallelism […]
Jun, 10

Sequential Monte Carlo Optimisation for Air Traffic Management

This report shows that significant reduction in fuel use could be achieved by the adoption of `free flight’ type of trajectories in the Terminal Manoeuvring Area (TMA) of an airport, under the control of an algorithm which optimises the trajectories of all the aircraft within the TMA simultaneously while maintaining safe separation. We propose the […]
Jun, 10

Design and optimization of DBSCAN Algorithm based on CUDA

DBSCAN is a very classic algorithm for data clus- tering, which is widely used in many fields. However, with the data scale growing much more bigger than before, the traditional serial algorithm can not meet the performance requirement. Recently, parallel computing based on CUDA has developed very fast and has great advantage on big data. […]
Jun, 8

Improving OpenCL Programmability with the Heterogeneous Programming Library

The use of heterogeneous devices is becoming increasingly widespread. Their main drawback is their low programmability due to the large amount of details that must be handled. Another important problem is the reduced code portability, as most of the tools to program them are vendor or device-specific. The exception to this observation is OpenCL, which […]
Jun, 8

CGO: G: Intelligent Heuristic Construction with Active Learning

Building effective optimization heuristics is a challenging task which often takes developers several months if not years to complete. Predictive modelling has recently emerged as a promising solution, automatically constructing heuristics from training data, however, obtaining this data can take months per platform. This is becoming an ever more critical problem as the pace of […]
Jun, 8

Exploring CPU-GPU Coherence

AMD, ARM and other members of the Heterogeneous Systems Architecture Foundation are focusing on integrated CPU-GPU systems with shared memory, to improve the programmability of heterogeneous systems. Such integration is also necessary to eliminate the energy and latency costs associated with conventional heterogeneous computation. This work investigates the relevance of CPU-GPU coherence for current heterogeneous […]
Jun, 8

Cryptanalysis of the McEliece Cryptosystem on GPGPUs

The linear code based McEliece cryptosystem is potentially promising as a so-called "post-quantum" public key cryptosystem because thus far it has resisted quantum cryptanalysis, but to be considered secure, the cryptosystem must resist other attacks as well. In 2011, Bernstein et al. introduced the "Ball Collision Decoding" (BCD) attack on McEliece which is a significant […]
Jun, 8

Bi-directional Path Tracing on GPU

Computer graphics renderers for creating photo-realistic images use mainly unidirectional path tracing, having good results for scenes without caustics or hard cases. There are also few renderers with bi-directional path tracing implementation, however due to the complexity of the algorithm implementation, they almost exclusively target sequential CPUs. The thesis proposes a way of implementation of […]
Jun, 7

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations. Advances like SPPnet and Fast R-CNN have reduced the running time of these detection networks, exposing region proposal computation as a bottleneck. In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus […]
Page 5 of 812« First...34567...102030...Last »

* * *

* * *

Follow us on Twitter

HGPU group

1496 peoples are following HGPU @twitter

Like us on Facebook

HGPU group

252 people like HGPU on Facebook

* * *

Free GPU computing nodes at hgpu.org

Registered users can now run their OpenCL application at hgpu.org. We provide 1 minute of computer time per each run on two nodes with two AMD and one nVidia graphics processing units, correspondingly. There are no restrictions on the number of starts.

The platforms are

Node 1
  • GPU device 0: nVidia GeForce GTX 560 Ti 2GB, 822MHz
  • GPU device 1: AMD/ATI Radeon HD 6970 2GB, 880MHz
  • CPU: AMD Phenom II X6 @ 2.8GHz 1055T
  • RAM: 12GB
  • OS: OpenSUSE 13.1
  • SDK: nVidia CUDA Toolkit 6.5.14, AMD APP SDK 3.0
Node 2
  • GPU device 0: AMD/ATI Radeon HD 7970 3GB, 1000MHz
  • GPU device 1: AMD/ATI Radeon HD 5870 2GB, 850MHz
  • CPU: Intel Core i7-2600 @ 3.4GHz
  • RAM: 16GB
  • OS: OpenSUSE 12.3
  • SDK: AMD APP SDK 3.0

Completed OpenCL project should be uploaded via User dashboard (see instructions and example there), compilation and execution terminal output logs will be provided to the user.

The information send to hgpu.org will be treated according to our Privacy Policy

HGPU group © 2010-2015 hgpu.org

All rights belong to the respective authors

Contact us: