12479
Quentin Avril, Valerie Gouranton, Bruno Arnaldi
We have presented several contributions on the collision detection optimization centered on hardware performance. We focus on the first step (Broad-phase) and propose three new ways of parallelization of the well-known Sweep and Prune algorithm. We first developed a multi-core model takes into account the number of available cores. Multi-core architecture enables us to distribute […]
View View   Download Download (PDF)   
Pritam Prakash Shete, Venkat P. P. K., Dinesh M. Sarode, Mohini Laghate, S. K. Bose
In this paper, we present Compute Unified Device Architecture i.e. CUDA based pyramidal image blending algorithm using an object oriented design patterns. This algorithm is an essential part of an image stitching process for a seamless panoramic mosaic. The CUDA framework is a novel GPU programming framework from NVIDIA. We introduce an object oriented framework […]
View View   Download Download (PDF)   
Cedric Augonnet
Multicore machines equipped with accelerators are becoming increasingly popular in the High Performance Computing ecosystem. Hybrid architectures provide significantly improved energy efficiency, so that they are likely to generalize in the Manycore era. However, the complexity introduced by these architectures has a direct impact on programmability, so that it is crucial to provide portable abstractions […]
View View   Download Download (PDF)   
Pritam Prakash Shete, Venkat P. P. K., Dinesh M. Sarode, Mohini Laghate, S. K. Bose, A. G. Apte
In this paper, we propose and implement the object oriented framework for the CUDA based pyramidal image blending. This algorithm is an essential part of an image stitching process for a seamless panoramic mosaic. The CUDA framework is a novel GPU programming framework from NVIDIA. It offers a complex integration framework and require more than […]
View View   Download Download (PDF)   
Pritam Prakash Shete, Venkat P. P. K., S. K. Bose
We propose and implement a pyramidal image blending algorithm using modern programmable graphic processing units. This algorithm is an essential part of an image stitching process for a seamless panoramic mosaic. The CUDA framework is a novel GPU programming framework from NVIDIA. We realize significant acceleration in computations of the pyramidal image blending algorithm by […]
View View   Download Download (PDF)   
Vaibhav Saxena, Yogish Sabharwal, Pramod Bhatotia
The slow progress in memory access latencies in comparison to CPU speeds has resulted in memory accesses dominating code performance. While architectural enhancements have benefited applications with data locality and sequential access, random memory access still remains a cause for concern. Several benchmarks have been proposed to evaluate the random memory access performance on multicore […]
View View   Download Download (PDF)   
Quentin Avril, Valerie Gouranton, Bruno Arnaldi
In this paper we present a new technique to dynamically adapt the first step (broad phase) of the collision detection process on hardware architecture during simulation. Our approach enables to face the unpredictable evolution of the simulation scenario (this includes addition of complex objects, deletion, split into several objects, …). Our technique of dynamic adaptation […]
View View   Download Download (PDF)   
Ke-yan Liu, Tong Zhang, Lei Wang
In this paper, a hybrid parallel computing framework is proposed for video understanding and retrieval. It is a unified computing architecture based on the Map-Reduce programming model, which supports multi-core and GPU architectures. A key task scheduler is designed for the parallelization of computation tasks. The SVM method is used to train models for video […]
W.C. Barker, S. Thada
The Siemens ECAT HRRT PET scanner has the potential to produce images of the human brain with spatial resolution better than 3 mm. MOLAR (a motion-compensation OSEM List-mode Algorithm for Resolution-recovery) was developed to provide reconstructions of HRRT data with the best possible accuracy and precision. However, a computer cluster is required to generate reconstructions […]
Naoyuki Ichimura
Local invariant features have been widely used as fundamental elements for image matching and object recognition. Although dense sampling of local features is useful in achieving an improved performance in image matching and object recognition, it results in increased computational costs for feature extraction. The purpose of this paper is to develop fast computational techniques […]
View View   Download Download (PDF)   
Cedric Augonnet, Samuel Thibault, Raymond Namyst
Multicore machines equipped with accelerators are becoming increasingly popular. The TOP500-leading RoadRunner machine is probably the most famous example of a parallel computer mixing IBM Cell Broadband Engines and AMD opteron processors. Other architectures, featuring GPU accelerators, are expected to appear in the near future. To fully tap into the potential of these hybrid machines, […]
Cedric Augonnet, Raymond Namyst
Approaching the theoretical performance of heterogeneous multicore architectures, equipped with specialized accelerators, is a challenging issue. Unlike regular CPUs that can transparently access the whole global memory address range, accelerators usually embed local memory on which they perform all their computations using a specific instruction set. While many research efforts have been devoted to offloading […]
View View   Download Download (PDF)   
Page 1 of 212

* * *

* * *

Like us on Facebook

HGPU group

171 people like HGPU on Facebook

Follow us on Twitter

HGPU group

1282 peoples are following HGPU @twitter

* * *

Free GPU computing nodes at hgpu.org

Registered users can now run their OpenCL application at hgpu.org. We provide 1 minute of computer time per each run on two nodes with two AMD and one nVidia graphics processing units, correspondingly. There are no restrictions on the number of starts.

The platforms are

Node 1
  • GPU device 0: AMD/ATI Radeon HD 5870 2GB, 850MHz
  • GPU device 1: AMD/ATI Radeon HD 6970 2GB, 880MHz
  • CPU: AMD Phenom II X6 @ 2.8GHz 1055T
  • RAM: 12GB
  • OS: OpenSUSE 13.1
  • SDK: AMD APP SDK 2.9
Node 2
  • GPU device 0: AMD/ATI Radeon HD 7970 3GB, 1000MHz
  • GPU device 1: nVidia GeForce GTX 560 Ti 2GB, 822MHz
  • CPU: Intel Core i7-2600 @ 3.4GHz
  • RAM: 16GB
  • OS: OpenSUSE 12.2
  • SDK: nVidia CUDA Toolkit 6.0.1, AMD APP SDK 2.9

Completed OpenCL project should be uploaded via User dashboard (see instructions and example there), compilation and execution terminal output logs will be provided to the user.

The information send to hgpu.org will be treated according to our Privacy Policy

HGPU group © 2010-2014 hgpu.org

All rights belong to the respective authors

Contact us: