11412

Posts

Feb, 12

Yang-Mills lattice on CUDA

The Yang-Mills fields have an important role in the non-Abelian gauge field theory which describes the properties of the quark-gluon plasma. The real time evolution of the classical fields is given by the equations of motion which are derived from the Hamiltonians to contain the term of the SU(2) gauge field tensor. The dynamics of […]
Feb, 11

A scheduling and runtime framework for a cluster of heterogeneous machines with multiple accelerators

We present a system that enables simple and intuitive programming of CPU+GPU clusters. This system relieves the programmer of the burden of load balancing, detailed data communication, task mapping, scheduling, etc. Our programming model is based on bulk synchronous distributed shared memory model, which is suitable for heterogenous multi-GPU clusters, especially so for compute intensive […]
Feb, 11

Confidentiality Issues on a GPU in a Virtualized Environment

General-Purpose computing on Graphics Processing Units (GPGPU) combined to cloud computing is already a commercial success. However, there is little literature that investigates its security implications. Our objective is to highlight possible information leakage due to GPUs in virtualized and cloud computing environments. We provide insight into the different GPU virtualization techniques, along with their […]
Feb, 11

Exploiting GPU Parallelism to Optimize Real-World Problems

Construction of optimal schedule for airline crew-scheduling requires high computation time. The main objective to create this optimal schedule is to assign all the crews to available flights in a minimum amount of time. This is a highly constrained optimization problem. In this paper, we implement co-evolutionary genetic algorithm in order to solve this problem. […]
Feb, 11

Exploring Multiple Levels of Performance Modeling for Heterogeneous Systems

One of the major challenges faced by the HPC community today is user-friendly and accurate heterogeneous performance modeling. Although performance prediction models exist to fine-tune applications, they are seldom easy-to-use and do not address multiple levels of design space abstraction. Our research aims to bridge the gap between reliable performance model selection and user-friendly analysis. […]
Feb, 11

Benchmarking the Intel Xeon Phi Coprocessor

This document summarizes our first experience with the Intel Xeon Phi. This is a coprocessor that uses Intel’s Many Integrated Core (MIC) architecture to speed up highly parallel processes involving intensive numerical computations. The MIC coprocessor communicates with a regular Intel Xeon ("host") processor through its operating system. The Xeon Phi coprocessor is sometimes referred […]
Feb, 11

Point Rendering in CUDA Path Tracer

A novel technique for point rendering in a CUDA path tracer is introduced in this proposal. The approach makes it possible to render point represented geometries with global illumination effects. Octree data structure is combined in order for more efficient intersection determination. Furthermore, Octree enables the users/artists to choose the level of details of the […]
Feb, 11

Accurate Cross-Architecture Performance Modeling for Sparse Matrix-Vector Multiplication (SpMV) on GPUs

This paper presents an integrated analytical and profile-based cross-architecture performance modeling tool to specifically provide inter-architecture performance prediction for Sparse Matrix-Vector Multiplication (SpMV) on NVIDIA GPU architectures. To design and construct the tool, we investigate the inter-architecture relative performance for multiple SpMV kernels. For a sparse matrix, based on its SpMV kernel performance measured on […]
Feb, 11

Implementing the Projected Spatial Rich Features on a GPU

The Projected Spatial Rich Model (PSRM) generates powerful steganalysis features, but requires the calculation of tens of thousands of convolutions with image noise residuals. This makes it very slow: the reference implementation takes an impractical 20{30 minutes per 1 megapixel (Mpix) image. We present a case study which first tweaks the definition of the PSRM […]
Feb, 11

Genetically Improved CUDA C++ Software

Genetic Programming (GP) may dramatically increase the performance of software written by domain experts. GP and autotuning are used to optimise and refactor legacy GPGPU C code for modern parallel graphics hardware and software. Speed ups of more than six times on recent nVidia GPU cards are reported compared to the original kernel on the […]
Feb, 9

GPGPU-Assisted Subpixel Tracking Method for Fiducial Markers

With an aim to realizing highly accurate position estimation, we propose in this paper a method for efficiently and accurately detecting the 3D positions and poses of traditional fiducial markers with black frames in high-resolution images taken by ordinary web cameras. Our tracking method can be efficiently executed utilizing GPGPU computation, and in order to […]
Feb, 9

Benchmarks for Intel MIC Architecture

Intel Many Integrated Core (MIC) Architecture combines about 60 cores onto a single chips. Intel MIC brand named Xeon Phi offers a theoretical maximum of more than 3 double precision GFLOPs than Intel Xeon E5 core. We carry out benchmarks for Intel MIC with a Monte Carlo simulation of LIBOR Market Model. The results show […]

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us:

contact@hpgu.org