10952

Posts

Nov, 19

Real-time rendering of large surface-scanned range data natively on a GPU

This thesis presents research carried out for the visualisation of surface anatomy data stored as large range images such as those produced by stereo-photogrammetric, and other triangulation-based capture devices. As part of this research, I explored the use of points as a rendering primitive as opposed to polygons, and the use of range images as […]
Nov, 19

Adaptive implementation selection in the SkePU skeleton programming library

In earlier work, we have developed the SkePU skeleton programming library for modern multicore systems equipped with one or more programmable GPUs. The library internally provides four types of implementations (implementation variants) for each skeleton: serial C++, OpenMP, CUDA and OpenCL targeting either CPU or GPU execution respectively. Deciding which implementation would run faster for […]
Nov, 19

A study of the speed and the accuracy of the Boundary Element Method as applied to the computational simulation of biological organs

In this work, first a Fortran code is developed for three dimensional linear elastostatics using constant boundary elements; the code is based on a MATLAB code developed by the author earlier. Next, the code is parallelized using BLACS, MPI, and ScaLAPACK. Later, the parallelized code is used to demonstrate the usefulness of the Boundary Element […]
Nov, 19

Implementation of the twisted mass fermion operator in the QUDA library

We discuss an extension of the QUDA library for the Wilson twisted mass operator. A performance analysis is presented for both degenerate and non-degenerate flavor doublets. The degenerate twisted mass fermion operator runs at up to 190, 487 and 856 Gflops, for double, single and half precisions respectively on recent NVIDIA Kepler GPUs, while our […]
Nov, 19

An implicit multigrid solver for high-order compressible flow simulations on GPUs

The multigrid method has proved to be effective for a large class of numerical methods. In this study, a strategy based on Full Approximation Storage (FAS) scheme is implemented together with Full Multigrid Algorithm (FMG) to accelerate convergence of steady state solutions of the two-dimensional compressible Euler equations on Graphics Processing Unit (GPU). The Beam […]
Nov, 18

Neurokernel: An Open Scalable Software Framework for Emulation and Validation of Drosophila Brain Models on Multiple GPUs

The brain of the fruit fly Drosophila melanogaster is an extremely attractive model system for reverse engineering the emergent properties of neural circuits because it implements complex sensory-driven behaviors with a nervous system comprising a number of components that is five orders of magnitude smaller than those of mammals. A powerful toolkit of well-developed genetic […]
Nov, 18

Integrating Multi-GPU Execution in an OpenACC Compiler

GPUs have become promising computing devices in current and future computer systems due to its high performance, high energy efficiency, and low price. However, lack of high level GPU programming models hinders the wide spread of GPU applications. To resolve this issue, OpenACC is developed as the first industry standard of a directive-based GPU programming […]
Nov, 18

Specification and verification of GPGPU programs

Graphics Processing Units (GPUs) are increasingly used for general-purpose applications because of their low price, energy efficiency and enormous computing power. Considering the importance of GPU applications, it is vital that the behaviour of GPU programs can be specified and proven correct formally. This paper presents a logic to verify GPU kernels written in OpenCL, […]
Nov, 18

Probing the Statistical Validity of the Ductile-to-Brittle Transition in Metallic Nanowires Using GPU Computing

We perform a large-scale statistical analysis (> 2000 independent simulations) of the elongation and rupture of gold nanowires, probing the validity and scope of the recently proposed ductile-to-brittle transition that occurs with increasing nanowire length [Wu et. al., Nano Lett., 12, 910-914 (2012)]. To facilitate a high-throughput simulation approach, we implement the second-moment approximation to […]
Nov, 18

Performance and Power Comparisons Between Fermi and Cypress GPUs

In recent years, modern graphics processing units have been widely adopted in high performance computing areas to solve large scale computation problems. The leading GPU manufacturers Nvidia and AMD have introduced series of products to the market. While sharing many similar design concepts, GPUs from these two manufacturers differ in several aspects on processor cores […]
Nov, 17

GPGPU-accelerated Interesting Interval Discovery and other Computations on GeoSpatial Datasets – A Summary of Results

It is imperative that for scalable solutions of GIS computations the modern hybrid architecture comprising a CPUGPU pair is exploited fully. The existing parallel algorithms and data structures port reasonably well to multicore CPUs, but poorly to GPGPUs because of latter’s atypical fine-grained, single-instruction multiple-thread (SIMT) architecture, extreme memory hierarchy and coalesced access requirements, and […]
Nov, 17

Implementation of Diamond Search Algorithm Using Parallel Processing Architecture

In video communication whole content of video cannot be stored without processing. So there is a need to compress the video before transmission and storage this process is called as video compression. Video compression plays an important role with regard to real-time scouting/video conferencing applications. Regarding the entire motion based video compression process, movement estimation […]

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us:

contact@hpgu.org