Posts
Mar, 29
Hardware/Software Vectorization for Closeness Centrality on Multi-/Many-Core Architectures
Centrality metrics have shown to be highly correlated with the importance and loads of the nodes in a network. Given the scale of today’s social networks, it is essential to use efficient algorithms and high performance computing techniques for their fast computation. In this work, we exploit hardware and software vectorization in combination with fine-grain […]
Mar, 28
Improving Cache Locality for GPU-based Volume Rendering
We present a cache-aware method for accelerating texture-based volume rendering on a graphics processing unit (GPU). Because a GPU has hierarchical architecture in terms of processing and memory units, cache optimization is important to maximize performance for memory-intensive applications. Our method localizes texture memory reference according to the location of the viewpoint and dynamically selects […]
Mar, 28
GPU-accelerated automatic identification of robust beam setups for proton and carbon-ion radiotherapy
We demonstrate acceleration on graphic processing units (GPU) of automatic identification of robust particle therapy beam setups, minimizing negative dosimetric effects of Bragg peak displacement caused by treatment-time patient positioning errors. Our particle therapy research toolkit, RobuR, was extended with OpenCL support and used to implement calculation on GPU of the Port Homogeneity Index, a […]
Mar, 28
Implementation of Just In Time Value Specialization for the Optimization of Data Parallel Kernels
This dissertation explores just-in-time (JIT) specialization as an optimization for OpenCL data-parallel compute kernels. It describes the implementation and performance of two extensions to OpenCL, Bacon and Specialization Annotated OpenCL (SOCL). Bacon is a replacement interface for OpenCL that provides improved usability and has JIT specialization built in. SOCL is a simple extension to OpenCL […]
Mar, 28
Pulse-coupled neural network performance for real-time identification of vegetation during forced landing
Safety concerns in the operation of autonomous aerial systems require safe-landing protocols be followed during situations where the mission should be aborted due to mechanical or other failure. This article presents a pulse-coupled neural network (PCNN) to assist in the vegetation classification in a vision-based landing site detection system for an unmanned aircraft. We propose […]
Mar, 27
Jacobian-free Newton-Krylov methods with GPU acceleration for computing nonlinear ship wave patterns
The nonlinear problem of steady free-surface flow past a submerged source is considered as a case study for three-dimensional ship wave problems. Of particular interest is the distinctive wedge-shaped wave pattern that forms on the surface of the fluid. By reformulating the governing equations with a standard boundary-integral method, we derive a system of nonlinear […]
Mar, 27
2014 Workshop on Heterogeneous and Unconventional Cluster Architectures and Applications, HUCAA 2014, in conjunction with ICPP2014
The workshop on Heterogeneous and Unconventional Cluster Architectures and Applications gears to gather recent work on heterogeneous and unconventional cluster architectures and applications, which might have a big impact on future cluster architectures. This includes any cluster architecture that is not based on the usual commodity components and therefore makes use of some special hard- […]
Mar, 26
Accelerating GPU Implementation of Contourlet Transform
The widespread usage of the contourlet-transform (CT) and today’s real-time needs demand faster execution of CT. Solutions are available, but due to lack of portability or computational intensity, they are disadvantageous in real-time applications. In this paper we take advantage of modern GPUs for the acceleration purpose. GPU is well-suited to address data-parallel computation applications […]
Mar, 26
A New Parallel Implementation of DSI Based Disparity Computation Using CUDA
Stereo matching techniques are used to extract 3D information from 2D stereo pair of images. It can be classified into feature based approach, window (area) based approach, and optimization based approach. Feature based approach generally generates sparse disparity map with high accuracy and low execution time. Window based approach produces dense disparity map with low […]
Mar, 25
BigKernel — High Performance CPU-GPU Communication Pipelining for Big Data-style Applications
GPUs offer an order of magnitude higher compute power and memory bandwidth than CPUs. GPUs therefore might appear to be well suited to accelerate computations that operate on voluminous data sets in independent ways; e.g., for transformations, filtering, aggregation, partitioning or other ”Big Data” style processing. Yet experience indicates that it is difficult, and often […]
Mar, 25
Interpolation with Radial Basis Functions on GPGPUs using CUDA
This report gives a brief introduction to the interpolation with radial basis functions and it’s application to the deformation of computational grids. The FGP algorithm is quoted as an iterative method for the calculation of the interpolation coefficients. A multipole method is described for the efficient approximation of the required matrix-vector product. Results are presented […]
Mar, 25
Distortion correction algorithm for UAV remote sensing image based on CUDA
In China, natural disasters are characterized by wide distribution, severe destruction and high impact range, and they cause significant property damage and casualties every year. Following a disaster, timely and accurate acquisition of geospatial information can provide an important basis for disaster assessment, emergency relief, and reconstruction. In recent years, Unmanned Aerial Vehicle (UAV) remote […]

