6918

Posts

Jan, 6

CodePy

The C/C++ metaprogramming toolkit for Python [16], CodePy [2], is analysed according to its source code generation possibility and its way to generate extension modules for Python. The combination of both results in generating C code in a Python script and executing it from within the same script. Insights are given on how this roundtrip […]
Jan, 6

Low-power Task Scheduling for GPU Energy Reduction

Graphics processing units (GPU) have been intensively used by high-performance computing applications. However, GPU’s large power consumption is a big issue coexisting with the high parallelism. Although Dynamic Voltage and Frequency Scaling (DVFS) [1] has been heavily studied and successfully applied to real products for saving CPU power consumption, DVFS is still relatively new for […]
Jan, 6

Multiple-GPU Scalability of Phase-Field Simulation for Dendritic Solidification

Mechanical properties of metallic materials like steel depend on the solidification process. In order to study the morphology of the microstructure in the materials, the phase-field model derived from the non-equilibrium statistical physics is applied and the interface dynamics is solved by GPU computing. Since very high performance is required, 3-dimensional simulations have not been […]
Jan, 6

Efficient 3D reconstruction of large-scale urban environments from street-level video

Recovering the 3-dimensional (3D) structure of a scene from 2-dimensional (2D) images is a fundamental problem in computer vision. This technology has many applications in computer graphics, entertainment, robotics, transportation, manufacturing, security, etc. One application is 3D mapping. For example, Google Earth and Microsoft Bing Maps provide a 3D virtual replica of many of the […]
Jan, 6

Applying OOC Techniques in the Reduction to Condensed Form for Very Large Symmetric Eigenproblems on GPUs

In this paper we address the reduction of a dense matrix to tridiagonal form for the solution of symmetric eigenvalue problems on a graphics processor (GPU) when the data is too large to fit into the accelerator memory. We apply out-of-core techniques to a three-stage algorithm, carefully redesigning the first stage to reduce the number […]
Jan, 5

A GPU Implementation of Inclusion-based Points-to Analysis

Graphics Processing Units (GPUs) have emerged as powerful accelerators for many regular algorithms that operate on dense arrays and matrices. In contrast, we know relatively little about using GPUs to accelerate highly irregular algorithms that operate on pointer-based data structures such as graphs. For the most part, research has focused on GPU implementations of graph […]
Jan, 5

PARRAY: A Unifying Array Representation for Heterogeneous Parallelism

This paper introduces a programming interface called PARRAY (or Parallelizing ARRAYs) that supports system-level succinct programming for heterogeneous parallel systems like GPU clusters. The current practice of software development requires combining several low-level libraries like Pthread, OpenMP, CUDA and MPI. Achieving productivity and portability is hard with different numbers and models of GPUs. PARRAY extends […]
Jan, 5

Selecting the Best Tridiagonal System Solver Projected on Multi-Core CPU and GPU Platforms

Nowadays multicore processors and graphics cards are commodity hardware that can be found in personal computers. Both CPU and GPU are capable of performing high-end computations. In this paper we present and compare parallel implementations of two tridiagonal system solvers. We analyze the cyclic reduction method, as an example of fine-grained parallelism, and Bondeli’s algorithm, […]
Jan, 5

Implementation of a Fast Image Coding and Retrieval System Using a GPU

Sparse coding of image patches is a compact but computationally expensive method of representing images. As part of our SenSIP consortium industry projects, we implement the Orthogonal Matching Pursuit algorithm using a single CUDA kernel on a GPU and sparse codes for image patches are obtained in parallel. Image-based "exact search" and "visually similar search" […]
Jan, 5

Fully 3D list-mode time-of-flight PET image reconstruction on GPUs using CUDA

PURPOSE: List-mode processing is an efficient way of dealing with the sparse nature of positron emission tomography (PET) data sets and is the processing method of choice for time-of-flight (ToF) PET image reconstruction. However, the massive amount of computation involved in forward projection and backprojection limits the application of list-mode reconstruction in practice, and makes […]
Jan, 5

BFROST: Binary Features from Robust Orientation Segment Tests accelerated on the GPU

We propose a fast local image feature detector and descriptor that is implementable on the GPU. Our method is the first GPU implementation of the popular FAST detector. A simple but novel method of feature orientation estimation which can be calculated in constant time is proposed. The robustness and reliability of our orientation estimation is […]
Jan, 5

A Parallel Supercomputer Implementation of a Biological Inspired Neural Network and its use for Pattern Recognition

A parallel implementation of a large spiking neural network is proposed and evaluated. The neural network implements the binding by synchrony process using the Oscillatory Dynamic Link Matcher (ODLM). Scalability, speed and performance are compared for 2 implementations: Message Passing Interface (MPI) and Compute Unified Device Architecture (CUDA) running on clusters of multicore supercomputers and […]

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us:

contact@hpgu.org