13858

Posts

Apr, 14

Image Classification with Pyramid Representation and Rotated Data Augmentation on Torch 7

This project classifies images in Tiny ImageNet Challenge, a dataset with 200 classes and 500 training examples for each class. Three network architectures are experimented: a traditional architecture with 4 convolutional layers + 2 fully-connected layers; a Tiny GoogleNet with 3 inception layers; and a pyramid representation-based network. Tiny GoogleNet achieved the highest top-1 validation […]
Apr, 14

Massively Parallel Ray Tracing Algorithm Using GPU

Ray tracing is a technique for generating an image by tracing the path of light through pixels in an image plane and simulating the effects of high-quality global illumination at a heavy computational cost. Because of the high computation complexity, it can’t reach the requirement of real-time rendering. The emergence of many-core architectures, makes it […]
Apr, 14

OpenCL-Z Android Released on Google Play

Developers have been using utility tools such as CPU-Z, GPU-Z, CUDA-Z, OpenCL-Z for a long time. These tools provide platform and hardware information in details and help developers quickly understand the hardware capabilities. Recently, OpenCL has been supported by most of the latest mobile phones/tablets, as the mobile GPUs are gaining more compute power. OpenCL-A […]
Apr, 12

GPU-based digital hologram reconstruction and particle detection

Digital holograms, when combined with tracer particles, can be used for examining otherwise-invisible fluid flows. These holograms can be captured with standard digital imaging equipment, however processing them to extract tracer or particle locations is computationally expensive. Exacerbating the issue is that hundreds or thousands of holograms must be reconstructed to analyze a single flow.Presented […]
Apr, 12

Framework for Batched and GPU-resident Factorization Algorithms Applied to Block Householder Transformations

As modern hardware keeps evolving, an increasingly effective approach to develop energy efficient and high-performance solvers is to design them to work on many small size and independent problems. Many applications already need this functionality, especially for GPUs, which are currently known to be about four to five times more energy efficient than multicore CPUs. […]
Apr, 12

Batched Matrix Computations on Hardware Accelerators

Scientific applications require solvers that work on many small size problems that are independent from each other. At the same time, the high-end hardware evolves rapidly and becomes ever more throughput-oriented and thus there is an increasing need for effective approach to develop energy efficient, high-performance codes for these small matrix problems that we call […]
Apr, 12

Robust real time face recognition and tracking on gpu using fusion of rgb and depth image

This paper presents a real-time face recognition system using kinect sensor. The algorithm is implemented on GPU using opencl and significant speed improvements are observed. We use kinect depth image to increase the robustness and reduce computational cost of conventional LBP based face recognition. The main objective of this paper was to perform robust, high […]
Apr, 12

Model Coupling between the Weather Research and Forecasting Model and the DPRI Large Eddy Simulator for Urban Flows on GPU-accelerated Multicore Systems

In this report we present a novel approach to model coupling for shared-memory multicore systems hosting OpenCL-compliant accelerators, which we call The Glasgow Model Coupling Framework (GMCF). We discuss the implementation of a prototype of GMCF and its application to coupling the Weather Research and Forecasting Model and an OpenCL-accelerated version of the Large Eddy […]
Apr, 9

Automated GPU Kernel Transformations in Large-Scale Production Stencil Applications

This paper proposes an end-to-end framework for automatically transforming stencil-based CUDA programs to exploit inter-kernel data locality. The CUDA-to-CUDA transformation collectively replaces the user-written kernels by auto-generated kernels optimized for data reuse. The transformation is based on two basic operations, kernel fusion and fission, and relies on a series of automated steps: gathering metadata, generating […]
Apr, 8

Finite element numerical integration for first order approximations on multi-core architectures

The paper presents investigations on the implementation and performance of the finite element numerical integration algorithm for first order approximations and three processor architectures, popular in scientific computing, classical CPU, Intel Xeon Phi and NVIDIA Kepler GPU. A unifying programming model and portable OpenCL implementation is considered for all architectures. Variations of the algorithm due […]
Apr, 8

GPU Accelerated Strong and Branching Bisimilarity Checking

Bisimilarity checking is an important operation to perform explicit-state model checking when the state space of a model under verification has already been generated. It can be applied in various ways: reduction of a state space w.r.t. a particular flavour of bisimilarity, or checking that two given state spaces are bisimilar. Bisimilarity checking is a […]
Apr, 8

Enhancing Fluid Modeling with Turbulence and Acceleration

In this dissertation, we have proposed our solutions to four important and challenging topics in enhancing fluid modeling with turbulence and acceleration: distance field representation of obstacles in fluid, adaptive and controllable turbulence enhancement, Langevin Particles and GPU acceleration in fluid modeling. All these fields aims at creating realistic and fast fluid field which are […]

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us:

contact@hpgu.org