13849

Posts

Apr, 14

OpenCL-Z Android Released on Google Play

Developers have been using utility tools such as CPU-Z, GPU-Z, CUDA-Z, OpenCL-Z for a long time. These tools provide platform and hardware information in details and help developers quickly understand the hardware capabilities. Recently, OpenCL has been supported by most of the latest mobile phones/tablets, as the mobile GPUs are gaining more compute power. OpenCL-A […]
Apr, 12

Framework for Batched and GPU-resident Factorization Algorithms Applied to Block Householder Transformations

As modern hardware keeps evolving, an increasingly effective approach to develop energy efficient and high-performance solvers is to design them to work on many small size and independent problems. Many applications already need this functionality, especially for GPUs, which are currently known to be about four to five times more energy efficient than multicore CPUs. […]
Apr, 12

GPU-based digital hologram reconstruction and particle detection

Digital holograms, when combined with tracer particles, can be used for examining otherwise-invisible fluid flows. These holograms can be captured with standard digital imaging equipment, however processing them to extract tracer or particle locations is computationally expensive. Exacerbating the issue is that hundreds or thousands of holograms must be reconstructed to analyze a single flow.Presented […]
Apr, 12

Batched Matrix Computations on Hardware Accelerators

Scientific applications require solvers that work on many small size problems that are independent from each other. At the same time, the high-end hardware evolves rapidly and becomes ever more throughput-oriented and thus there is an increasing need for effective approach to develop energy efficient, high-performance codes for these small matrix problems that we call […]
Apr, 12

Robust real time face recognition and tracking on gpu using fusion of rgb and depth image

This paper presents a real-time face recognition system using kinect sensor. The algorithm is implemented on GPU using opencl and significant speed improvements are observed. We use kinect depth image to increase the robustness and reduce computational cost of conventional LBP based face recognition. The main objective of this paper was to perform robust, high […]
Apr, 12

Model Coupling between the Weather Research and Forecasting Model and the DPRI Large Eddy Simulator for Urban Flows on GPU-accelerated Multicore Systems

In this report we present a novel approach to model coupling for shared-memory multicore systems hosting OpenCL-compliant accelerators, which we call The Glasgow Model Coupling Framework (GMCF). We discuss the implementation of a prototype of GMCF and its application to coupling the Weather Research and Forecasting Model and an OpenCL-accelerated version of the Large Eddy […]
Apr, 9

Automated GPU Kernel Transformations in Large-Scale Production Stencil Applications

This paper proposes an end-to-end framework for automatically transforming stencil-based CUDA programs to exploit inter-kernel data locality. The CUDA-to-CUDA transformation collectively replaces the user-written kernels by auto-generated kernels optimized for data reuse. The transformation is based on two basic operations, kernel fusion and fission, and relies on a series of automated steps: gathering metadata, generating […]
Apr, 8

Finite element numerical integration for first order approximations on multi-core architectures

The paper presents investigations on the implementation and performance of the finite element numerical integration algorithm for first order approximations and three processor architectures, popular in scientific computing, classical CPU, Intel Xeon Phi and NVIDIA Kepler GPU. A unifying programming model and portable OpenCL implementation is considered for all architectures. Variations of the algorithm due […]
Apr, 8

GPU Accelerated Strong and Branching Bisimilarity Checking

Bisimilarity checking is an important operation to perform explicit-state model checking when the state space of a model under verification has already been generated. It can be applied in various ways: reduction of a state space w.r.t. a particular flavour of bisimilarity, or checking that two given state spaces are bisimilar. Bisimilarity checking is a […]
Apr, 8

Enhancing Fluid Modeling with Turbulence and Acceleration

In this dissertation, we have proposed our solutions to four important and challenging topics in enhancing fluid modeling with turbulence and acceleration: distance field representation of obstacles in fluid, adaptive and controllable turbulence enhancement, Langevin Particles and GPU acceleration in fluid modeling. All these fields aims at creating realistic and fast fluid field which are […]
Apr, 8

Benchmarking the cost of thread divergence in CUDA

All modern processors include a set of vector instructions. While this gives a tremendous boost to the performance, it requires a vectorized code that can take advantage of such instructions. As an ideal vectorization is hard to achieve in practice, one has to decide when different instructions may be applied to different elements of the […]
Apr, 8

Early Experiences Running the 3D Stencil Jacobi Method in Intel Xeon Phi

Iterative stencil computations are important pattern of computations in different computational fields such as physics or chemistry simulations. A stencil computation repeatedly updates each point of a d-dimensional grid as a function of itself and its near neighbors. As the demand for more and more compute power is growing rapidly in different fields of research, […]

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us: