9974

Posts

Jul, 8

A Smart GPU Implementation of an Elliptic Kernel for an Ocean Global Circulation Model

In this paper, the preconditioning technique of an elliptic Laplace problem in a global circulation ocean model is analyzed. We suggest an inverse preconditioning technique in order to efficiently compute the numerical solution of the elliptic kernel. Moreover, we show how the convergence rate and the performance of the solver are strictly linked to the […]
Jul, 8

ParadisEO-MO-GPU: a Framework for Parallel GPU-based Local Search Metaheuristics

In this paper, we propose a pioneering framework called ParadisEO-MO-GPU for the reusable design and implementation of parallel local search metaheuristics (S-Metaheuristics) on Graphics Processing Units (GPU). We revisit the ParadisEO-MO software framework to allow its utilization on GPU accelerators focusing on the parallel iteration-level model, the major parallel model for S-Metaheuristics. It consists in […]
Jul, 8

Coalition Structure Generation with the Graphic Processor Unit

Coalition Structure Generation-the problem of finding the optimal set of coalitions – has received considerable attention in recent AI literature. The fastest exact algorithm to solve this problem is IDP-IP*, due to Rahwan et al. (2012). This algorithm is a hybrid of two previous algorithms, namely IDP and IP. As such, it is desirable to […]
Jul, 8

Comparison and Analysis of GPU Energy Efficiency For CUDA and OpenCL

The use of GPUs for processing large sets of parallelizable data has increased sharply in recent years. As the concept of GPU computing is still relatively young, parameters other than computation time, such as energy efficiency, are being overlooked. Two parallel computing platforms, CUDA and OpenCL, provide developers with an interface that they can use […]
Jul, 8

GPU Implementation of Real-Time Biologically Inspired Face Detection using CUDA

In this paper massively parallel real-time face detection based on a visual attention and cortex-like mechanism of cognitive vision system is presented. As a first step, we use saliency map model to select salient face regions and HMAX C1 model to extract features from salient input image and then apply mixture of expert neural network […]
Jul, 7

Comparison of Rectangular Matrix Multiplication with and without Border Conditions

Matrix multiplication algorithms are very common and widely used for computation in almost any field. There are many implementations for matrix multiplication on different platforms and programming models. GPU devices in the recent years have become powerful computational units that have entered the segment of high performance computing. In this paper we are analysing two […]
Jul, 7

Solving 3D Anisotropic Elastic Wave Equations on Parallel GPU Devices

Efficiently modelling seismic datasets in complex 3D anisotropic media by solving the 3D elastic wave equation is an important challenge in computational geophysics. Using a stress-stiffness formulation on a regular grid, we present a 3D finite-difference time-domain (FDTD) solver using a 2nd-order temporal and 8th-order spatial accuracy stencil that leverages the massively parallel architecture of […]
Jul, 7

A Comparative Study of Neighborhood Filters for Artifact Reduction in Iterative Low-Dose CT

Iterative CT algorithms have become increasingly popular in recent years. They have been found useful when the projections are limited in number, irregularly spaced, or noisy, which are often encountered in low-dose CT imaging. One way to cope with the associated streak and noise artifacts is to interleave a regularization objective into the iterative reconstruction […]
Jul, 7

CrowdCL: Web-Based Volunteer Computing with WebCL

We present CrowdCL, an open-source framework for the rapid development of volunteer computing and OpenCL applications on the web. Drawing inspiration from existing GPU libraries like PyCUDA, CrowdCL provides an abstraction layer for WebCL aimed at reducing boilerplate and improving code readability. CrowdCL also provides developers with a framework to easily run computations in the […]
Jul, 7

Comparative study of parallel programming models for multicore computing

Shared memory multi-core processor technology has seen a drastic development with faster and increasing number of processors per chip. This new architecture challenges computer programmers to write code that scales over these many cores to exploit full computational power of these machines. Shared-memory parallel programming paradigms such as OpenMP and Intel Threading Building Blocks (TBB) […]
Jul, 5

Optimize Overall System Performance Through Workload Sequencing for GPUs Data Offloading

With the proliferation of general purpose computation, GPUs are becoming extremely important to significantly improve system performance for many computing systems, including embedded systems. Running massively parallel kernels on GPUs is challenging for system’s overall performance especially when a large number of workloads (kernels) are running together. In this paper, we establish a mechanism to […]
Jul, 5

Hybrid Acceleration of a Molecular Dynamics Simulation Using Short-Ranged Potentials

Molecular dynamics simulations are a very useful tool to study the behavior and interaction of atoms and molecules in chemical and bio-molecular systems. With the fast rising complexity of such simulations hybrid systems with both, multi-core processors (CPUs) and multiple graphics processing units (GPUs), become more and more popular. To obtain an optimal performance this […]

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us: