3745

Posts

Apr, 19

A Task-centric Memory Model for Scalable Accelerator Architectures

This paper presents a task-centric memory model for 1000-core compute accelerators. Visual computing applications are emerging as an important class of workloads that can exploit 1000-core processors. In these workloads, we observe data sharing and communication patterns that can be leveraged in the design of memory systems for future 1000-core processors. Based on these insights, […]
Apr, 19

Dynamic warp formation: Efficient MIMD control flow on SIMD graphics hardware

Recent advances in graphics processing units (GPUs) have resulted in massively parallel hardware that is easily programmable and widely available in today’s desktop and notebook computer systems. GPUs typically use single-instruction, multiple-data (SIMD) pipelines to achieve high performance with minimal overhead for control hardware. Scalar threads running the same computing kernel are grouped together into […]
Apr, 19

Dynamic detection of uniform and affine vectors in GPGPU computations

We present a hardware mechanism which dynamically detects uniform and affine vectors used in SPMD architecture such as Graphics Processing Units, to minimize pressure on the register file and reduce power consumption with minimal architectural modifications. A preliminary experimental analysis conducted with the Barra simulator shows that this optimization can benefit up to 34 % […]
Apr, 19

Parallel calculation of the median and order statistics on GPUs with application to robust regression

We present and compare various approaches to a classical selection problem on Graphics Processing Units (GPUs). The selection problem consists in selecting the $k$-th smallest element from an array of size $n$, called $k$-th order statistic. We focus on calculating the median of a sample, the $n/2$-th order statistic. We introduce a new method based […]
Apr, 19

A pseudospectral matrix method for time-dependent tensor fields on a spherical shell

We construct a pseudospectral method for the solution of time-dependent, non-linear partial differential equations on a three-dimensional spherical shell. The problem we address is the treatment of tensor fields on the sphere. As a test case we consider the evolution of a single black hole in numerical general relativity. A natural strategy would be the […]
Apr, 18

Building a Personal High Performance Computer with Heterogeneous Processors

Personal high performance computer (PHPC) requires lower cost and high performance. The Teraflops PHPC systems with special accelerator units like GPGPU have been presented, but they have difficulties in programming, compatibility and applicability. In this paper, we present HPP-PHPC, a hybrid architecture of heterogeneous processors connected by non-coherent off-chip system bus. The performance of HPP-PHPC […]
Apr, 18

Practical Pre-stack Kirchhoff Time Migration of Seismic Processing on General Purpose GPU

In this paper, we introduced three prototypes of GPGPU solutions on NVidia GeForce8800GT for a practical Pre-stack Kirchhoff Time Migration program. We presented how to re-design and re-implement the original CPU code to efficiency GPU code. The prototypes are more than at most 7.2 times faster than its CPU version on Intel’s P4 3.0G.
Apr, 18

GPU detectors for interference cancellation in chaos-based CDMA communications

Multi-user detection is an effective technique to reduce the mutual interference between users in code division multiple access (CDMA) communications at the cost of a larger number of arithmetic operations. It is shown that multi-user detection can be efficiently computed on graphics processors using a GPGPU approach. Specifically, two GPU parallel interference cancellation detectors for […]
Apr, 18

Efficient characterizations of composite materials electrical properties based on GPU accelerated finite difference method

In this paper, a GPU accelerated three-dimensional finite difference method is presented as an efficient approach of performing fast parallel simulations of composite materials. Using a NVIDIA GeForce 9800 series GPGPU and with an optimized CUDA implementation, a considerable speed-up (>20) was observed for simulations of large size problems. Further performance improvements could be achieved […]
Apr, 18

Many-Core vs. Many-Thread Machines: Stay Away From the Valley

We study the tradeoffs between many-core machines like Intel’s Larrabee and many-thread machines like Nvidia and AMD GPGPUs. We define a unified model describing a superposition of the two architectures, and use it to identify operation zones for which each machine is more suitable. Moreover, we identify an intermediate zone in which both machines deliver […]
Apr, 18

Linear optimization on modern GPUs

Optimization algorithms are becoming increasingly more important in many areas, such as finance and engineering. Typically, real problems involve several hundreds of variables, and are subject to as many constraints. Several methods have been developed trying to reduce the theoretical time complexity. Nevertheless, when problems exceed reasonable sizes they end up being very computationally intensive. […]
Apr, 18

Real-time 3D reconstruction and pose estimation for human motion analysis

In this paper, we present a markerless 3D motion capture system based on a volume reconstruction technique of non rigid bodies. It depicts a new approach for pose estimation in order to fit an articulated body model into the captured real-time information. We aim at analyzing athlete’s movements in real-time within a 3D interactive graphics […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: