12011

Posts

May, 2

Real-time Simultaneous Pose and Shape Estimation for Articulated Objects Using a Single Depth Camera

In this paper we present a novel real-time algorithm for simultaneous pose and shape estimation for articulated objects, such as human beings and animals. The key of our pose estimation component is to embed the articulated deformation model with exponential-maps-based parametrization into a Gaussian Mixture Model. Benefiting from the probabilistic measurement model, our algorithm requires […]
May, 2

3D FFT on a Single FPGA

The 3D FFT is critical in many physical simulations and image processing applications. On FPGAs, however, the 3D FFT was thought to be inefficient relative to other methods such as convolution-based implementations of multigrid. We find the opposite: a simple design, operating at a conservative frequency, takes 4ms for 16^3, 21ms for 32^3, and 215ms […]
May, 2

Analysis of SuperLU Solvers on Intel MIC Architecture

Intel Xeon Phi is a coprocessor with sixty-one cores in a single chip. The chip has a more powerful FPU that contains 512-bit SIMD registers. Intel Xeon Phi chip can benefit from the algorithms that operate with the large vectors. In this work, sequential, multithreaded and distributed versions of SuperLU solvers are tested on the […]
May, 2

Address Selection for Efficient Barriers on the Intel Xeon Phi

Synchronization primitives are a long-standing issue in parallel programming. Barrier in particular are ubiquitous as common paradigm such as OpenMP makes extensive use of them by ending all parallel sections on a barrier by default. The rising number of simultaneous threads in commodity hardware only exacerbate the problem as for a given amount of computation […]
May, 2

Digital Marbling: a GPU Approach with Precomputed Velocity Field

Paper marbling is the art of creating intricate designs on an aqueous surface. We present an interactive digital marbling system by simulating fluid dynamics on the GPU using precomputed solutions to the Navier-Stokes equations. Experimental results have shown that our approach is not only significantly faster, but also sufficiently accurate compared to existing approaches.
Apr, 30

Thickness computation of trimmed B-Rep model using GPU ray tracing

This paper demonstrates the use of direct ray tracing of large industrial CAD models on the GPU. The ray tracing kernel is a building block used to compute and assess the validity of the mechanical design of CAD models. A high precision and efficient solution to handle trimmed surfaces with holes is discussed. The central […]
Apr, 30

Accelerating Ant Colony Optimization-based Edge Detection on the GPU using CUDA

Ant Colony Optimization (ACO) is a nature-inspired metaheuristic that can be applied to a wide range of optimization problems. In this paper we present the first parallel implementation of an ACO-based (image processing) edge detection algorithm on the Graphics Processing Unit (GPU) using NVIDIA CUDA. We extend recent work so that we are able to […]
Apr, 30

KernelInterceptor: automating GPU kernel verification by intercepting kernels and their parameters

GPUVerify is a static analysis tool for verifying that GPU kernels are free from data races and barrier divergence. It is intended as an automatic tool, but its usability is impaired by the fact that the user must explicitly supply the kernel source code, the number of threads, and some kernel arguments. Extracting this information […]
Apr, 30

Atmospheric Chemistry

Atmospheric chemistry is a multidisciplinary research field with respect to emissions, interactions and physical and chemical transformation of chemical compounds in the atmosphere. It also investigates the key physical processes occurring in the atmosphere. This eBook provides a comprehensive review of this topic showing recent advances in this field e.g., parallelization of mathematical models with […]
Apr, 30

Exercising high-level parallel programming on streams: a systems biology use case

The stochastic modelling of biological systems, coupled with Monte Carlo simulation of models, is an increasingly popular technique in Bioinformatics. The simulation-analysis workflow may result into a computationally expensive task reducing the interactivity required in the model tuning. In this work, we advocate high-level software design as a vehicle for building efficient and portable parallel […]
Apr, 29

SUPERGLUE: A Shared Memory Framework Using Data Versioning for Dependency-Aware Task-Based Parallelization

In computational science, making efficient use of modern multicore based computer hardware is necessary in order to deal with complex real-life application problems. However, with increased hardware complexity, the cost in man hours of writing and re-writing software to adapt to evolving computer systems is becoming prohibitive. Task based parallel programming models aim to allow […]
Apr, 29

Heterogeneous Computing and Grid Scheduling with Hierarchically Parallel Evolutionary Algorithms

This work presents the novel parallel evolutionary algorithm (EA) for task scheduling in distributed heterogeneous computing and grid environments, NP-hard problems with capital relevance in distributed computing. Parallelization of the biologically inspired heuristics is hierarchically designed and integrates with the two traditional parallel models (master-slave models and island models). The method has been specifically implemented […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: