1728

Posts

Nov, 22

Parallel Position Weight Matrices Algorithms

Position Weight Matrices (PWMs) are broadly used in computational biology. The basic problems, Scan and Multiscan, aim to find all the occurrences of a given PWM or a set of PWMs in long sequences. Some other PWM tasks share a common NP-hard subproblem, ScoreDistribution The existing algorithms rely on the enumeration on a large set […]
Nov, 22

Memory Access Optimized Implementation of Cyclic and Quasi-Cyclic LDPC Codes on a GPGPU

Software based decoding of low-density parity-check (LDPC) codes frequently takes very long time, thus the general purpose graphics processing units (GPGPUs) that support massively parallel processing can be very useful for speeding up the simulation. In LDPC decoding, the parity-check matrix H needs to be accessed at every node updating process, and the size of […]
Nov, 22

State-of-the-art in heterogeneous computing

Node level heterogeneous architectures have become attractive during the last decade for several reasons: compared to traditional symmetric CPUs, they offer high peak performance and are energy and/or cost efficient. With the increase of fine-grained parallelism in high-performance computing, as well as the introduction of parallelism in workstations, there is an acute need for a […]
Nov, 22

On optimization of finite-difference time-domain (FDTD) computation on heterogeneous and GPU clusters

A model for the computational cost of finite-difference time-domain (FDTD) method irrespective of implementation details or the application domain is given. The model is used to formalize the problem of optimal distribution of computational load to an arbitrary set of resources across a heterogeneous cluster. We show that the problem can be formulated as a […]
Nov, 22

GPU implementation of a road sign detector based on particle swarm optimization

Road Sign Detection is a major goal of the Advanced Driving Assistance Systems. Most published work on this problem share the same approach by which signs are first detected and then classified in video sequences, even if different techniques are used. While detection is usually performed using classical computer vision techniques based on color and/or […]
Nov, 22

CUDA by Example: An Introduction to General-Purpose GPU Programming

CUDA by Example, written by two senior members of the CUDA software platform team, shows programmers how to employ this new technology. The authors introduce each area of CUDA development through working examples. After a concise introduction to the CUDA platform and architecture, as well as a quick-start guide to CUDA C, the book details […]
Nov, 22

Micropolygon ray tracing with defocus and motion blur

We present a micropolygon ray tracing algorithm that is capable of efficiently rendering high quality defocus and motion blur effects. A key component of our algorithm is a BVH (bounding volume hierarchy) based on 4D hyper-trapezoids that project into 3D OBBs (oriented bounding boxes) in spatial dimensions. This acceleration structure is able to provide tight […]
Nov, 22

Octree-based, GPU implementation of a continuous cellular automaton for the simulation of complex, evolving surfaces

Presently, dynamic surface-based models are required to contain increasingly larger numbers of points and to propagate them over longer time periods. For large numbers of surface points, the octree data structure can be used as a balance between low memory occupation and relatively rapid access to the stored data. For evolution rules that depend on […]
Nov, 22

Algorithm level power efficiency optimization for CPU-GPU processing element in data intensive SIMD/SPMD computing

Power efficiency investigation has been required in each level of a High Performance Computing (HPC) system because of the increasing computation demands of scientific and engineering applications. Focusing on handling the critical design constraints in software level that run beyond a parallel system composed of huge numbers of power-hungry components, we optimize HPC program design […]
Nov, 22

Higher-order CFD and Interface Tracking Methods on Highly-Parallel MPI and GPU systems

A computational investigation of the effects on parallel performance of higher-order accurate schemes was carried out on two different computational systems: a traditional CPU based MPI cluster and a system of four Graphics Processing Units (GPUs) controlled by a single quad-core CPU. The investigation was based on the solution of the level set equations for […]
Nov, 22

Efficient simulation of agent-based models on multi-GPU and multi-core clusters

An effective latency-hiding mechanism is presented in the parallelization of agent-based model simulations (ABMS) with millions of agents. The mechanism is designed to accommodate the hierarchical organization as well as heterogeneity of current state-of-the-art parallel computing platforms. We use it to explore the computation vs. communication trade-off continuum available with the deep computational and memory […]
Nov, 22

GPU-accelerated elastic 3D image registration for intra-surgical applications

Local motion within intra-patient biomedical images can be compensated by using elastic image registration. The application of B-spline based elastic registration during interventional treatment is seriously hampered by its considerable computation time. The graphics processing unit (GPU) can be used to accelerate the calculation of such elastic registrations by using its parallel processing power, and […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: