Mar, 29

Literature review: Build and Travel KD-Tree with CUDA

Ray tracing is an important and widely used tool in computer graphic. Entertainment and game industry have already bene t a lot from ray tracing. However, designers and end-users are forced to use off-line ray tracing tools for a long time due to the high computation load. In ray tracing, most of the computation is concentrated […]
Mar, 29

Hardware/Software Vectorization for Closeness Centrality on Multi-/Many-Core Architectures

Centrality metrics have shown to be highly correlated with the importance and loads of the nodes in a network. Given the scale of today’s social networks, it is essential to use efficient algorithms and high performance computing techniques for their fast computation. In this work, we exploit hardware and software vectorization in combination with fine-grain […]
Mar, 28

Improving Cache Locality for GPU-based Volume Rendering

We present a cache-aware method for accelerating texture-based volume rendering on a graphics processing unit (GPU). Because a GPU has hierarchical architecture in terms of processing and memory units, cache optimization is important to maximize performance for memory-intensive applications. Our method localizes texture memory reference according to the location of the viewpoint and dynamically selects […]
Mar, 28

GPU-accelerated automatic identification of robust beam setups for proton and carbon-ion radiotherapy

We demonstrate acceleration on graphic processing units (GPU) of automatic identification of robust particle therapy beam setups, minimizing negative dosimetric effects of Bragg peak displacement caused by treatment-time patient positioning errors. Our particle therapy research toolkit, RobuR, was extended with OpenCL support and used to implement calculation on GPU of the Port Homogeneity Index, a […]
Mar, 28

Implementation of Just In Time Value Specialization for the Optimization of Data Parallel Kernels

This dissertation explores just-in-time (JIT) specialization as an optimization for OpenCL data-parallel compute kernels. It describes the implementation and performance of two extensions to OpenCL, Bacon and Specialization Annotated OpenCL (SOCL). Bacon is a replacement interface for OpenCL that provides improved usability and has JIT specialization built in. SOCL is a simple extension to OpenCL […]
Mar, 28

Pulse-coupled neural network performance for real-time identification of vegetation during forced landing

Safety concerns in the operation of autonomous aerial systems require safe-landing protocols be followed during situations where the mission should be aborted due to mechanical or other failure. This article presents a pulse-coupled neural network (PCNN) to assist in the vegetation classification in a vision-based landing site detection system for an unmanned aircraft. We propose […]
Mar, 27

Jacobian-free Newton-Krylov methods with GPU acceleration for computing nonlinear ship wave patterns

The nonlinear problem of steady free-surface flow past a submerged source is considered as a case study for three-dimensional ship wave problems. Of particular interest is the distinctive wedge-shaped wave pattern that forms on the surface of the fluid. By reformulating the governing equations with a standard boundary-integral method, we derive a system of nonlinear […]
Mar, 27

2014 Workshop on Heterogeneous and Unconventional Cluster Architectures and Applications, HUCAA 2014, in conjunction with ICPP2014

The workshop on Heterogeneous and Unconventional Cluster Architectures and Applications gears to gather recent work on heterogeneous and unconventional cluster architectures and applications, which might have a big impact on future cluster architectures. This includes any cluster architecture that is not based on the usual commodity components and therefore makes use of some special hard- […]
Mar, 26

Accelerating GPU Implementation of Contourlet Transform

The widespread usage of the contourlet-transform (CT) and today’s real-time needs demand faster execution of CT. Solutions are available, but due to lack of portability or computational intensity, they are disadvantageous in real-time applications. In this paper we take advantage of modern GPUs for the acceleration purpose. GPU is well-suited to address data-parallel computation applications […]
Mar, 26

A New Parallel Implementation of DSI Based Disparity Computation Using CUDA

Stereo matching techniques are used to extract 3D information from 2D stereo pair of images. It can be classified into feature based approach, window (area) based approach, and optimization based approach. Feature based approach generally generates sparse disparity map with high accuracy and low execution time. Window based approach produces dense disparity map with low […]
Mar, 25

BigKernel — High Performance CPU-GPU Communication Pipelining for Big Data-style Applications

GPUs offer an order of magnitude higher compute power and memory bandwidth than CPUs. GPUs therefore might appear to be well suited to accelerate computations that operate on voluminous data sets in independent ways; e.g., for transformations, filtering, aggregation, partitioning or other ”Big Data” style processing. Yet experience indicates that it is difficult, and often […]
Mar, 25

Interpolation with Radial Basis Functions on GPGPUs using CUDA

This report gives a brief introduction to the interpolation with radial basis functions and it’s application to the deformation of computational grids. The FGP algorithm is quoted as an iterative method for the calculation of the interpolation coefficients. A multipole method is described for the efficient approximation of the required matrix-vector product. Results are presented […]
Page 10 of 705« First...89101112...203040...Last »

* * *

* * *

* * *

Free GPU computing nodes at

Registered users can now run their OpenCL application at We provide 1 minute of computer time per each run on two nodes with two AMD and one nVidia graphics processing units, correspondingly. There are no restrictions on the number of starts.

The platforms are

Node 1
  • GPU device 0: AMD/ATI Radeon HD 5870 2GB, 850MHz
  • GPU device 1: AMD/ATI Radeon HD 6970 2GB, 880MHz
  • CPU: AMD Phenom II X6 @ 2.8GHz 1055T
  • RAM: 12GB
  • OS: OpenSUSE 11.4
  • SDK: AMD APP SDK 2.8
Node 2
  • GPU device 0: AMD/ATI Radeon HD 7970 3GB, 1000MHz
  • GPU device 1: nVidia GeForce GTX 560 Ti 2GB, 822MHz
  • CPU: Intel Core i7-2600 @ 3.4GHz
  • RAM: 16GB
  • OS: OpenSUSE 12.2
  • SDK: nVidia CUDA Toolkit 5.0.35, AMD APP SDK 2.8

Completed OpenCL project should be uploaded via User dashboard (see instructions and example there), compilation and execution terminal output logs will be provided to the user.

The information send to will be treated according to our Privacy Policy

HGPU group © 2010-2014

All rights belong to the respective authors

Contact us: