8197

Posts

Aug, 28

GPUVerify: A Verifier for GPU Kernels

We present a technique for verifying race- and divergencefreedom of GPU kernels that are written in mainstream kernel programming languages such as OpenCL and CUDA. Our approach is founded on a novel formal operational semantics for GPU programming termed synchronous, delayed visibility (SDV) semantics. The SDV semantics provides a precise definition of barrier divergence in […]
Aug, 28

Intelligent Edge Detection using a CUDA Simulator of Multilayer Neural Network Based on Multi-Valued Neurons

In this paper, we consider the edge detection problem using an intelligent approach. We use a multilayer neural network based on multi-valued neurons (MLMVN) as an intelligent edge enhancer. MLMVN is a complex-valued neural network and it has many advantages over classical neural networks. It significantly outperforms a classical multilayer feedforward neural network in terms […]
Aug, 28

Performance Comparison Between Cg-based and CUDA-based Matrix Multiplications

In this paper, we compare the performances of Cg-based and CUDA-based GPU programming APIs. In particular, their performances on squared matrix multiplications are considered. We also discuss other aspects of these widely-used GPU programming APIs. This work can help gain insight on various applications that involve matrix multiplication that are better suited for a specific […]
Aug, 28

Optimization Techniques for CUDA Application

In this paper, we summarize our experiment results of applying various optimization techniques for CUDA application running on NVIDIA Fermi GPUs. Our experiments on matrix multiplication and breadth first search algorithms show that optimization techniques such as coalesced global memory access, conflict-free shared memory access and data pre-fetching improve the performance of applications running on […]
Aug, 28

A Research of MapReduce with GPU Acceleration

MapReduce is an efficient distributed computing model on large data sets. The data processing is fully distributed on huge amount of nodes, and a MapReduce cluster is of highly scalable. However, single-node performance is gradually to be a bottleneck in computeintensive jobs, which makes it difficult to extend the MapReduce model to wider application fields […]
Aug, 27

A Unified Optimizing Compiler Framework for Different GPGPU Architectures

This paper presents a novel optimizing compiler for general purpose computation on graphics processing units (GPGPU). It addresses two major challenges of developing high performance GPGPU programs: effective utilization of GPU memory hierarchy and judicious management of parallelism. The input to our compiler is a naive GPU kernel function, which is functionally correct but without […]
Aug, 27

Low-Latency Elliptic Curve Scalar Multiplication

This paper presents a low-latency algorithm designed for parallel computer architectures to compute the scalar multiplication of elliptic curve points based on approaches from cryptographic side-channel analysis. A graphics processing unit implementation using a standardized elliptic curve over a 224-bit prime field, complying with the new 112-bit security level, computes the scalar multiplication in 1.9 […]
Aug, 27

An Implementation of Coincidence Algorithm on Graphic Processing Units

Genetic Algorithms (GAs) are powerful search techniques. However when they are applied to complex problems, they consume large computation power. One of the choices to make them faster is to use a parallel implementation. This paper presents a parallel implementation of Combinatorial Optimisation with Coincidence Algorithm (COIN) on Graphic Processing Units. COIN is a modern […]
Aug, 27

Perceptually Optimized Real-Time Computer Graphics

Perceptual optimization, the application of human visual perception models to remove imperceptible components in a graphics system, has been proven effective in achieving significant computational speedup. Previous implementations of this technique have focused on spatial level of detail reduction, which typically results in noticeable degradation of image quality. This thesis introduces refresh rate modulation (RRM), […]
Aug, 27

A Novel Approach to Visualizing Dark Matter Simulations

In the last decades cosmological N-body dark matter simulations have enabled ab initio studies of the formation of structure in the Universe. Gravity amplified small density fluctuations generated shortly after the Big Bang, leading to the formation of galaxies in the cosmic web. These calculations have led to a growing demand for methods to analyze […]
Aug, 26

GPU Accelerated Nonlinear Optimization in Radio Interferometric Calibration

We present the GPU based acceleration of two well known nonlinear optimization routines: Levenberg-Marquardt (LM) and Limited Memory Broyden-Fletcher-Goldfarb-Shanno (LBFGS) in radio interferometric calibration. Radio interferometric calibration is a heavily compute intensive operation where the same nonlinear optimization problem has to be solved over many time intervals, with different data. We achieve a speedup of […]
Aug, 26

Efficient Dynamic Program Monitoring on Multi-Core Platforms

Software security and reliability have become increasingly important in the modern world. An effective approach to enforcing software security and reliability is to monitor a program’s execution at run time. However, instrumentation-based implementation of a dynamic program monitor on single-core systems suffers significant performance overhead. As multi-core architecture becomes more mainstream, implementing efficient dynamic program […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: