5388

Posts

Aug, 27

directCell: hybrid systems with tightly coupled accelerators

The Cell Broadband Engine (Cell/B.E.) processor is a hybrid IBM PowerPC processor. In blade servers and PCI Express card systems, it has been used primarily in a server context, with Linux as the operating system. Because neither Linux as an operating system nor a PowerPC processor-based architecture is the preferred choice for all applications, some […]
Aug, 27

A breadth-first course in multicore and manycore programming

The technique of scaling hardware performance through increasing the number of cores on a chip requires programmers to learn to write parallel code that can exploit this hardware. In order to expose students to a variety of multicore programming models, our university offered a breadth-first introduction to multicore and manycore programming for upper-level undergraduates. Our […]
Aug, 27

A structured parallel periodic arnoldi shooting algorithm for RF-PSS analysis based on GPU platforms

The recent multi/many-core CPUs or GPUs have provided an ideal parallel computing platform to accelerate the time-consuming analysis of radio-frequency/millimeter-wave (RF/MM) integrated circuit (IC). This paper develops a structured shooting algorithm that can fully take advantage of parallelism in periodic steady state (PSS) analysis. Utilizing periodic structure of the state matrix of RF/MM-IC simulation, a […]
Aug, 27

Automatic contention detection and amelioration for data-intensive operations

To take full advantage of the parallelism offered by a multi-core machine, one must write parallel code. Writing parallel code is difficult. Even when one writes correct code, there are numerous performance pitfalls. For example, an unrecognized data hotspot could mean that all threads effectively serialize their access to the hotspot, and throughput is dramatically […]
Aug, 27

Motion planning for autonomous driving with a conformal spatiotemporal lattice

We present a motion planner for autonomous highway driving that adapts the state lattice framework pioneered for planetary rover navigation to the structured environment of public roadways. The main contribution of this paper is a search space representation that allows the search algorithm to systematically and efficiently explore both spatial and temporal dimensions in real […]
Aug, 27

Fast and sleek glyph rendering for interactive HARDI data exploration

High angular resolution diffusion imaging (HARDI) is an emerging magnetic resonance imaging (MRI) technique that overcomes some decisive limitations of its predecessor diffusion tensor imaging (DTI). HARDI can resolve locally more than one direction in the diffusion pattern of water molecules and thereby opens up the opportunity to display and track crossing fibers. Showing the […]
Aug, 27

Rapid RNA Folding: Analysis and Acceleration of the Zuker Recurrence

RNA folding is a compute-intensive task that lies at the core of search applications in bioinformatics such as RNAfold and UNAFold. In this work, we analyze the Zuker RNA folding algorithm, which is challenging to accelerate because it is resource intensive and has a large number of variable-length dependencies. We use a technique of Lyngso […]
Aug, 27

A New Approach for Color Character Extraction Based on Parallel Clustering

A new approach of fast color character extraction was proposed. Clustering algorithm was adopted in our method to differentiate between objective character regions and background regions on the premise that character regions are nearly monochromatic. However, the key point of this approach was how to select suitable elements’ features based upon the original image information […]
Aug, 27

A new adaptive model for real-time fluid simulation with complex boundaries

In this paper, we present a new adaptive model for real-time fluid simulation with complex boundaries based on smoothed particle hydrodynamics (SPH) framework. Firstly, we introduce an adaptive SPH framework that is based on our character field function composed of 4 factors: geometrical complexity, boundary condition, physical complexity and complementary condition in terms of the […]
Aug, 26

Parallel Fast Gauss Transform

We present fast adaptive parallel algorithms to compute the sum of N Gaussians at N points. Direct sequential computation of this sum would take $O(N^2)$ time. The parallel time complexity estimates for our algorithms are $O(N/np)$ for uniform point distributions and $O(N/np log N/np + nplognp)$ for nonuniform distributions using np CPUs. We incorporate a […]
Aug, 26

PFunc: modern task parallelism for modern high performance computing

HPC today faces new challenges due to paradigm shifts in both hardware and software. The ubiquity of multi-cores, many-cores, and GPGPUs is forcing traditional serial as well as distributed-memory parallel applications to be parallelized for these architectures. Emerging applications in areas such as informatics are placing unique requirements on parallel programming tools that have not […]
Aug, 26

Challenging cloning related problems with GPU-based algorithms

Graphics Processing Unit (GPU) have been around for a while. Although they are primarily used for high-end 3D graphics processing, their use is now acknowledged for general massive parallel computing. This paper presents an original technique based on [10] to compute many instances of the longest common subsequence problem on a generic GPU architecture using […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: