9108

Posts

Mar, 12

Parallelization Research of Circle Detection Based on Hough Transform

There is a problem of too long computation time in Circle detection of Hough transform. In this paper, two paralleled methods are given based on Threading Building Blocks (TBB) and CUDA, by utilizing multi-core and GPU, the most timeconsuming part of circle detection is coped with parallelization. Experimental results show that the circle detection algorithms […]
Mar, 12

Just-in-time Acceleration of JavaScript

JavaScript has seen tremendous growth in popularity driven by increasingly interactive web sites and sophisticated web interfaces. However, the performance of JavaScript continues to be a hurdle in using it for tasks that are computationally intensive, such as gaming, simulations, and visualization. JavaScript has also been slow to exploit the available parallelism on modern computers. […]
Mar, 12

Comprehensive Analysis of High-Performance Computing Methods for Filtered Back-Projection

This paper provides an extensive runtime, accuracy, and noise analysis of Computed Tomography (CT) reconstruction algorithms using various High-Performance Computing (HPC) frameworks such as: "conventional" multi-core, multi threaded CPUs, Compute Unified Device Architecture (CUDA), and DirectX or OpenGL graphics pipeline programming. The proposed algorithms exploit various built-in hardwired features of GPUs such as rasterization and […]
Mar, 12

Parallel spatial data structures for interactive rendering

The main question explored in this thesis is how to define novel parallel random-access data structures for surface and image spatial data with efficient construction, storage, and query memory access patterns. Our main contribution is a set of parallel-efficient methods to evaluate irregular, sparse or even implicit geometries and textures in different applications: a method […]
Mar, 12

High Performance GPU Accelerated Local Optimization in TSP

This paper presents a high performance GPU accelerated implementation of 2-opt local search algorithm for the Traveling Salesman Problem (TSP). GPU usage significantly decreases the execution time needed for tour optimization, however it also requires a complicated and well tuned implementation. With the problem size growing, the time spent on local optimization comparing the graph […]
Mar, 12

A Scalable Heterogeneous Parallelization Framework for Iterative Local Searches

This paper describes and evaluates a highly-scalable framework for running iterative local searches on heterogeneous HPC platforms. The user only needs to provide serial CPU or single-GPU code that implements a simple interface. The framework then executes this code in parallel using MPI between compute nodes and OpenMP and multi-GPU support within nodes. It handles […]
Mar, 12

3D Modeling, Distance and Gradient Computation for Motion Planning: A Direct GPGPU Approach

The Kinect sensor and KinectFusion algorithm have revolutionized environment modeling. We bring these advances to optimization-based motion planning by computing the obstacle and self-collision avoidance objective functions and their gradients directly from the KinectFusion model on the GPU without ever transferring any model to the CPU. Based on this, we implement a proof-of-concept motion planner […]
Mar, 12

Performance Traps in OpenCL for CPUs

With its design concept of cross-platform portability, OpenCL can be used not only on GPUs (for which it is quite popular), but also on CPUs. Whether porting GPU programs to CPUs, or simply writing new code for CPUs, using OpenCL brings up the performance issue, usually raised in one of two forms: "OpenCL is not […]
Mar, 12

Automatic efficient data layout for multithreaded stencil codes on CPUs and GPUs

Stencil based computation on structured grids is a kernel at the heart of a large number of scientific applications. The variety of stencil kernels used in practice make this computation pattern difficult to assemble into a high performance computing library. With the multiplication of cores on a single chip, answering architectural alignment requirements became an […]
Mar, 12

Morph Algorithms on GPUs

There is growing interest in using GPUs to accelerate graph algorithms such as breadth-first search, computing page-ranks, and finding shortest paths. However, these algorithms do not modify the graph structure, so their implementation is relatively easy compared to general graph algorithms like mesh generation and refinement, which morph the underlying graph in non-trivial ways by […]
Mar, 9

Speeding Up Model Building for ECGA on CUDA Platform

Parallelization is a straightforward approach to enhance the efficiency for evolutionary computation due to its inherently parallel nature. Since NVIDIA released the compute unified device architecture (CUDA), graphic processing units have enabled lots of scalable parallel programs in a wide range of fields. However, parallelization of model building for EDAs is rarely studied. In this […]
Mar, 9

Signal Processing and General Purpose Computing on GPU

Graphics processing units (GPUs) have been growing in popularity due to their impressive processing capabilities, and with general purpose programming languages such as NVIDIA’s CUDA interface, are becoming the platform of choice in the scientific computing community. Today the research community successfully uses GPU to solve a broad range of computationally demanding, complex problems. This […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: