George Papandreou, Liang-Chieh Chen, Alan L. Yuille
The goal of this paper is to question the necessity of features like SIFT in categorical visual recognition tasks. As an alternative, we develop a generative model for the raw intensity of image patches and show that it can support image classification performance on par with optimized SIFT-based techniques in a bag-of-visual-words setting. Key ingredient […]
View View   Download Download (PDF)   
Heqi Lu
Realistic images can be rendered by simulating light transport with Monte Carlo methods. The possibility to use realistic light sources for synthesizing images greatly contributes to their physical realism. Among existing models, the ones based on environment maps and light fields are attractive due to their ability to capture faithfully the far-field and near-field effects […]
View View   Download Download (PDF)   
Usman Dastgeer
This thesis addresses issues associated with efficiently programming modern heterogeneous GPU-based systems, containing multicore CPUs and one or more programmable Graphics Processing Units (GPUs). We use ideas from component-based programming to address programming, performance and portability issues of these heterogeneous systems. Specifically, we present three approaches that all use the idea of having multiple implementations […]
View View   Download Download (PDF)   
Luis-Pedro Garcia, Javier Cuenca, Domingo Gimenez
The use of auto-tuning techniques in a matrix multiplication routine for hybrid CPU+GPU platforms is analyzed. Basic models of the execution time of the hybrid routine and information obtained during its installation are used to optimize the execution time with a balanced assignation of the computation to the computing components in the heterogeneous system. Satisfactory […]
View View   Download Download (PDF)   
Naila Farooqui, Christopher J. Rossbach, Yuan Yu
Parallel architectures like GPUs are a tantalizing compute fabric for performance-hungry developers. While GPUs enable order-of-magnitude performance increases in many data-parallel application domains, writing efficient codes that can actually manifest those increases is a non-trivial endeavor, typically requiring developers to exercise specialized architectural features exposed directly in the programming model. Achieving good performance on GPUs […]
View View   Download Download (PDF)   
Aditya Deshpande, P J Narayanan
In this paper, we present an all-core implementation of Burrows Wheeler Compression algorithm that exploits all computing resources on a system. Our focus is to provide significant benefit to everyday users on common end-to-end applications by exploiting the parallelism of multiple CPU cores and many-core GPU on their machines. The all-core framework is suitable for […]
Jon Currey, Adam Eversole, Christopher J. Rossbach
Dataflow execution engines such as MapReduce, DryadLINQ and PTask have enjoyed success because they simplify development for a class of important parallel applications. Expressing the computation as a dataflow graph allows the runtime, and not the programmer, to own problems such as synchronization, data movement and scheduling – leveraging dynamic information to inform strategy and […]
View View   Download Download (PDF)   
Bjorn Nutti, Dragan Marinkovic
The paper presents a highly efficient way of simulating the dynamic behavior of deformable objects by means of the finite element method (FEM) with computations performed on Graphics Processing Units (GPU). The presented implementation reduces bottlenecks related to memory accesses by grouping the necessary data per node pairs, in contrast to the classical way done […]
View View   Download Download (PDF)   
Mark Joselli, Cristina Nader Vasconcelos, Esteban Clua
Multi-thread architectures are the current trends for both PCs (multi-core CPUs and GPUs) and game consoles such as the Microsoft Xbox 360 and Sony Playstation 3. GPUs (Graphics Processing Units) have evolved into extremely powerful and flexible processors, allowing its use for processing different data. This advantage can be used in game development to optimize […]
View View   Download Download (PDF)   
Sangho Lee, Youngsok Kim, Jangwoo Kim, Jong Kim
Graphics processing units (GPUs) are important components of modern computing devices for not only graphics rendering, but also efficient parallel computations. However, their security problems are ignored despite their importance and popularity. In this paper, we first perform an in-depth security analysis on GPUs to detect security vulnerabilities. We observe that contemporary, widely-used GPUs, both […]
View View   Download Download (PDF)   
Youngsok Kim, Jaewon Lee, Jae-Eon Jo, Jangwoo Kim
GPU programmers suffer from programmer-managed GPU memory because both performance and programmability heavily depend on GPU memory allocation and CPUGPU data transfer mechanisms. To improve performance and programmability, programmers should be able to place only the data frequently accessed by GPU on GPU memory while overlapping CPU-GPU data transfers and GPU executions as much as […]
View View   Download Download (PDF)   
Jianbin Fang, Henk Sips, Lilun Zhang, Chuanfu Xu, Yonggang Che, Ana Lucia Varbanescu
Based on Intel’s Many Integrated Core (MIC) architecture, Intel Xeon Phi is one of the few truly many-core CPUs – featuring around 60 fairly powerful cores, two levels of caches, and graphic memory, all interconnected by a very fast ring. Given its promised ease-of-use and high performance, we took Xeon Phi out for a test […]
Page 1 of 44612345...102030...Last »

* * *

* * *

* * *

Free GPU computing nodes at hgpu.org

Registered users can now run their OpenCL application at hgpu.org. We provide 1 minute of computer time per each run on two nodes with two AMD and one nVidia graphics processing units, correspondingly. There are no restrictions on the number of starts.

The platforms are

Node 1
  • GPU device 0: AMD/ATI Radeon HD 5870 2GB, 850MHz
  • GPU device 1: AMD/ATI Radeon HD 6970 2GB, 880MHz
  • CPU: AMD Phenom II X6 @ 2.8GHz 1055T
  • RAM: 12GB
  • OS: OpenSUSE 11.4
  • SDK: AMD APP SDK 2.8
Node 2
  • GPU device 0: AMD/ATI Radeon HD 7970 3GB, 1000MHz
  • GPU device 1: nVidia GeForce GTX 560 Ti 2GB, 822MHz
  • CPU: Intel Core i7-2600 @ 3.4GHz
  • RAM: 16GB
  • OS: OpenSUSE 12.2
  • SDK: nVidia CUDA Toolkit 5.0.35, AMD APP SDK 2.8

Completed OpenCL project should be uploaded via User dashboard (see instructions and example there), compilation and execution terminal output logs will be provided to the user.

The information send to hgpu.org will be treated according to our Privacy Policy

HGPU group © 2010-2014 hgpu.org

All rights belong to the respective authors

Contact us:

contact@hgpu.org