1399

Posts

Nov, 5

Using GPUs for Machine Learning Algorithms

Using dedicated hardware to do machine learning typically ends up in disaster because of cost, obsolescence, and poor software. The popularization of Graphic Processing Units (GPUs), which are now available on every PC, provides an attractive alternative. We propose a generic 2-layer fully connected neural network GPU implementation which yields over 3X speedup for both […]
Nov, 5

Clustering billions of data points using GPUs

In this paper, we report our research on using GPUs to accelerate clustering of very large data sets, which are common in today’s real world applications. While many published works have shown that GPUs can be used to accelerate various general purpose applications with respectable performance gains, few attempts have been made to tackle very […]
Nov, 5

Hierarchical parallel processing of large scale data clustering on a PC cluster with GPU co-processing

This paper presents an effective scheme for clustering a huge data set using a PC cluster system, in which each PC is equipped with a commodity programmable graphics processing unit (GPU). The proposed scheme is devised to achieve three-level hierarchical parallel processing of massive data clustering. The divide-and-conquer approach to parallel data clustering is employed […]
Nov, 5

On Dynamic Load Balancing on Graphics Processors

To get maximum performance on the many-core graphics processors it is important to have an even balance of the workload so that all processing units contribute equally to the task at hand. This can be hard to achieve when the cost of a task is not known beforehand and when new sub-tasks are created dynamically […]
Nov, 5

Coarse grain parallelization of evolutionary algorithms on GPGPU cards with EASEA

This paper presents a straightforward implementation of a standard evolutionary algorithm that evaluates its population in parallel on a GPGPU card. Tests done on a benchmark and a real world problem using an old NVidia 8800GTX card and a newer but not top of the range GTX260 card show a roughly 30x (resp. 100x) speedup […]
Nov, 5

MPI within a GPU

GPUs offer high-performance floating-point computation at commodity prices, but their usage is hindered by programming models which expose the user to irregularities in the current shared-memory environments and require learning new interfaces and semantics. This thesis will demonstrate that the message-passing paradigm can be conceptually cleaner than the current data-parallel models for programming GPUs because […]
Nov, 5

Interactive machinability analysis of free-form surfaces using multiple-view image space techniques on the GPU

In this paper we present a set of graphics hardware accelerated algorithms to interactively evaluate the machinability of complex free-form surfaces. These algorithms work in image space and easily interface with all common formats available on CAD systems. The running time of these algorithms is independent of the complexity of the surface to be analyzed […]
Nov, 5

An intelligent semi-automatic application porting system for application accelerators

Work involving the use of application acceleration devices is showing great promise, however, there are still major obstacles preventing their widespread adoption. Currently the process of porting applications to an accelerator requires expertise in both the computer science and application domains, due to the lack of abstraction available. We present our work associated with the […]
Nov, 5

An experimental approach to performance measurement of heterogeneous parallel applications using CUDA

Heterogeneous parallel systems using GPU devices for application acceleration have garnered significant attention in the supercomputing community. However, to realize the full potential of GPU computing, application developers will require tools to measure and analyze accelerator performance with respect to the parallel execution as a whole. A performance measurement technology for the NVIDIA CUDA platform […]
Nov, 5

Streamlining GPU applications on the fly: thread divergence elimination through runtime thread-data remapping

Because of their tremendous computing power and remarkable cost efficiency, GPUs (graphic processing unit) have quickly emerged as a kind of influential platform for high performance computing. However, as GPUs are designed for massive data-parallel computing, their performance is subject to the presence of condition statements in a GPU application. On a conditional branch where […]
Nov, 5

GPU-Accelerated Nearest Neighbor Search for 3D Registration

Nearest Neighbor Search (NNS) is employed by many computer vision algorithms. The computational complexity is large and constitutes a challenge for real-time capability. The basic problem is in rapidly processing a huge amount of data, which is often addressed by means of highly sophisticated search methods and parallelism. We show that NNS based vision algorithms […]
Nov, 5

Debugging GPU stream programs through automatic dataflow recording and visualization

We present a novel framework for debugging GPU stream programs through automatic dataflow recording and visualization. Our debugging system can help programmers locate errors that are common in general purpose stream programs but very difficult to debug with existing tools. A stream program is first compiled into an instrumented program using a compiler. This instrumenting […]
Page 918 of 941« First...102030...916917918919920...930940...Last »

* * *

* * *

HGPU group © 2010-2018 hgpu.org

All rights belong to the respective authors

Contact us: