7861

Posts

Jun, 20

Parallel Implementation of the Wu-Manber Algorithm Using the OpenCL Framework

One of the most significant issues of the computational biology is the multiple pattern matching for locating nucleotides and amino acid sequence patterns into biological databases. Sequential implementations for these processes have become inadequate, due to an increasing demand for more computational power. Graphic cards offer a high parallelism computational power improving the performance of […]
Jun, 20

An Investigation into Concurrent Expectation Propagation

As statistical machine learning becomes more and more prevalent and models become more complicated and fit to larger amounts of data, approximate inference mechanisms become more and more crucial to their success. Expectation propagation (EP) is one such algorithm for inference in probabilistic graphical models. In this work, we introduce a robustified version of EP […]
Jun, 19

Two Algorithms for Sorting On Heterogeneous Clusters

In the past few years, performance improvements in CPUs and memory technologies have outpaced those of storage systems. When extrapolated to the exascale, this trend places strict limits on the amount of data that can be written to disk for full analysis, resulting in an increased reliance on characterizing in-memory data. Many of these characterizations […]
Jun, 19

Parallel Rendering on Hybrid Multi-GPU Clusters

Achieving efficient scalable parallel rendering for interactive visualization applications on medium-sized graphics clusters remains a challenging problem. Framerates of up to 60hz require a carefully designed and fine-tuned parallel rendering implementation that fits all required operations into the 16ms time budget available for each rendered frame. Furthermore, modern commodity hardware embraces more and more a […]
Jun, 19

Optimizing dataflow applications on heterogeneous environments

The increases in multi-core processor parallelism and in the flexibility of many-core accelerator processors, such as GPUs, have turned traditional SMP systems into hierarchical, heterogeneous computing environments. Fully exploiting these improvements in parallel system design remains an open problem. Moreover, most of the current tools for the development of parallel applications for hierarchical systems concentrate […]
Jun, 19

Efficient simulations of long wave propagation and runup using a LBM approach on GPGPU hardware

We present an efficient implementation of the Lattice Boltzmann method (LBM) for the numerical simulation of the propagation of long ocean waves (e.g., tsunamis), based on the Nonlinear Shallow Water (NSW) wave equation. The LBM solution of NSW equations is fully nonlinear and it is assumed that the surface elevation is single-valued (hence, waves do […]
Jun, 19

Implementing density functional theory (DFT) methods on many-core GPGPU accelerators

Density Functional Theory (DFT) is one of the most widely used quantum mechanical methods for calculations of the electronic structure of molecules and surfaces, which achieves an excellent balance of accuracy and computational cost. However, for large molecular systems with few hundred atoms, the computational costs are become very high. Therefore, there is a fast […]
Jun, 18

Gdev: First-Class GPU Resource Management in the Operating System

Graphics processing units (GPUs) have become a very powerful platform embracing a concept of heterogeneous many-core computing. However, application domains of GPUs are currently limited to specific systems, largely due to a lack of "first-class" GPU resource management for general-purpose multi-tasking systems. We present Gdev, a new ecosystem of GPU resource management in the operating […]
Jun, 18

An Improved CUDA-Based Implementation of Differential Evolution on GPU

Modern GPUs enable widely affordable personal computers to carry out massively parallel computation tasks. NVIDIA’s CUDA technology provides a wieldy parallel computing platform. Many state-of-the-art algorithms arising from different fields have been redesigned based on CUDA to achieve computational speedup. Differential evolution (DE), as a very promising evolutionary algorithm, is highly suitable for parallelization owing […]
Jun, 18

OpenCL for programming shared memory multicore CPUs

Shared memory multicore processor technology is pervasive in mainstream computing. This new architecture challenges programmers to write code that scales over these many cores to exploit the full computational power of these machines. OpenMP and Intel Threading Building Blocks (TBB) are two of the popular frameworks used to program these architectures. Recently, OpenCL has been […]
Jun, 18

Solving the Vlasov equation for one-dimensional models with long range interactions on a GPU

We present a GPU parallel implementation of the numeric integration of the Vlasov equation in one spatial dimension based on a second order time-split algorithm with a local modified cubic-spline interpolation. We apply our approach to three different systems with long-range interactions: the Hamiltonian Mean Field, Ring and the self-gravitating sheet models. Speedups and accuracy […]
Jun, 18

OpenACC – First Experiences with Real-World Applications

Today’s trend to use accelerators like GPGPUs in heterogeneous computer systems has entailed several low-level APIs for accelerator programming. However, programming these APIs is often tedious and therefore unproductive. To tackle this problem, recent approaches employ directive-based high-level programming for accelerators. In this work, we present our first experiences with OpenACC, an API consisting of […]

Recent source codes

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us: