10967

Posts

Nov, 22

An improved parallel contrast-aware halftoning

Digital image halftoning is a widely used technique. However, achieving high fidelity tone reproduction and structural preservation with low computational time-cost remains a challenging problem. This paper presents a highly parallel algorithm to boost the real-time application of the serial structure-preserving error diffusion. The contrast-aware halftoning approach is one such technique with superior structure preservation, […]
Nov, 22

An Optimal Offline Permutation Algorithm on the Hierarchical Memory Machine, with the GPU implementation

The Hierarchical Memory Machine (HMM) is a theoretical parallel computing model that captures the essence of computation on CUDA-enabled GPUs. The offline permutation is a task to copy numbers stored in an array a of size n to an array b of the same size along a permutation P given in advance. A conventional algorithm […]
Nov, 22

Optimization of the Oktay-Kronfeld Action Conjugate Gradient Inverter

Improving the Fermilab action to third order in heavy quark effective theory yields the Oktay-Kronfeld action, a promising candidate for precise calculations of the spectra of heavy quark systems and weak matrix elements relevant to searches for new physics. We have optimized the bi-stabilized conjugate gradient inverter in the SciDAC QOPQDP library and are developing […]
Nov, 22

Bohrium: Unmodified NumPy Code on CPU, GPU, and Cluster

In this paper we introduce Bohrium, a runtime-system for mapping array-operations onto a number of different hardware platforms, from multi-core systems to clusters and GPU enabled systems. As a result, the Bohrium runtime system enables NumPy code to utilize CPU, GPU, and Clusters. Bohrium integrates seamlessly into NumPy through the implicit data parallelization of array […]
Nov, 21

Experience with Intel’s Many Integrated Core architecture in ATLAS software

Intel recently released the first commercial boards of its Many Integrated Core (MIC) Architecture. MIC is Intel’s solution for the domain of throughput computing, currently dominated by general purpose programming on graphics processors (GPGPU). MIC allows the use of the more familiar x86 programming model and supports standard technologies such as OpenMP, MPI, and Intel’s […]
Nov, 21

Direct Numeric Simulation of Sheared Convective Boundary Layer Entrainment with GPUs

Sheared convective boundary layers (SCBL) are a frequently observed boundary layer in nature and industry. This paper presents work conducted to validate a numerical fluid model of sheared convective boundary layers implemented in Nvidia’s CUDA programming language for graphical processing units. The code is based on finite difference implementation of the SIMPLE algorithm using the […]
Nov, 21

Towards an interactive and automated script feature analysis of 3D scanned cuneiform tablets

Current digitalization projects of ancient artifacts in the field of cultural heritage produce large amounts of data that can not be managed and analyzed in a reasonable amount of time by means of conventional philological methods. Therefore, this paper presents a novel approach to performing a fast and interactive 3D script feature extraction, analysis and […]
Nov, 20

Multi-GPU Support on the Marrow Algorithmic Skeleton Framework

With the proliferation of general purpose GPUs, workload parallelization and datatransfer optimization became an increasing concern. The natural evolution from using a single GPU, is multiplying the amount of available processors, presenting new challenges, as tuning the workload decompositions and load balancing, when dealing with heterogeneous systems. Higher-level programming is a very important asset in […]
Nov, 20

HyPHI – task based hybrid execution C++ library for the Intel Xeon Phi coprocessor

The Intel Threading Building Blocks (TBB) C++ library introduced task parallelism to a wide audience of application developers. The library is easy to use and powerful, but it is limited to shared-memory machines. In this paper we present HyPHI, a novel library for the Intel Xeon Phi coprocessor for building applications which execute using a […]
Nov, 20

International Workshop on OpenCL, IWOCL 2014

The International Workshop on OpenCL (IWOCL) is an annual meeting of OpenCL users, researchers, developers and suppliers to share OpenCL best practise, and to promote the evolution and advancement of the OpenCL standard. The meeting is open to anyone who is interested in contributing to, and participating in the OpenCL community. IWOCL is the premier […]
Nov, 19

Real-time rendering of large surface-scanned range data natively on a GPU

This thesis presents research carried out for the visualisation of surface anatomy data stored as large range images such as those produced by stereo-photogrammetric, and other triangulation-based capture devices. As part of this research, I explored the use of points as a rendering primitive as opposed to polygons, and the use of range images as […]
Nov, 19

Adaptive implementation selection in the SkePU skeleton programming library

In earlier work, we have developed the SkePU skeleton programming library for modern multicore systems equipped with one or more programmable GPUs. The library internally provides four types of implementations (implementation variants) for each skeleton: serial C++, OpenMP, CUDA and OpenCL targeting either CPU or GPU execution respectively. Deciding which implementation would run faster for […]

Recent source codes

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us:

contact@hpgu.org