Posts
Mar, 10
Realtime phase-based optical flow on the GPU
Phase-based optical flow algorithms are characterized by high precision and robustness, but also by high computational requirements. Using the CUDA platform, we have implemented a phase-based algorithm that maps exceptionally well on the GPUs architecture. This optical flow algorithm revolves around a reliability measure that evaluates the consistency of phase information over time. By exploiting […]
Mar, 10
An Efficient SAR Processor Based on GPU via CUDA
A novel and efficient Synthetic Aperture Radar (SAR) processor is introduced in this paper. This new processor is implemented on the Graphics Processing Unit (GPU). GPU is traditionally used for graphics rendering, but in recent years, it has rapidly evolved as a highly-parallel processor with tremendous computation capability and ultra-high memory bandwidth. The algorithm of […]
Mar, 10
Using a GPU to accelerate die and mold fabrication
The authors present a GPU-based method for generating and verifying cutter paths for numerically controlled milling. A CAM system based on this technology is now employed in production at Mazda Motor Corporation for manufacturing stamping dies. This system can compute cutter paths more than 20 times faster than previous methods
Mar, 10
A Predictive Shutdown Technique for GPU Shader Processors
As technology continues to shrink, reducing leakage is critical to achieve energy efficiency. Previous works on low-power GPU (graphics processing unit) focus on techniques for dynamic power reduction, such as DVFS (Dynamic Voltage/Frequency Scaling) and clock gating. In this paper, we explore the potential of adopting architecture-level power gating techniques for leakage reduction on GPU. […]
Mar, 10
Performance analysis and optimization strategies for a D3Q19 lattice Boltzmann kernel on nVIDIA GPUs using CUDA
This paper presents implementation strategies and optimization approaches for a D3Q19 lattice Boltzmann flow solver on nVIDIA graphics processing units (GPUs). Using the STREAM benchmarks we demonstrate the GPU parallelization approach and obtain an upper limit for the flow solver performance. We discuss the GPU-specific implementation of the solver with a focus on memory alignment […]
Mar, 9
Gyrokinetic Particle-in-Cell Optimization on Emerging Multi- and Manycore Platforms
The next decade of high-performance computing (HPC) systems will see a rapid evolution and divergence of multi- and manycore architectures as power and cooling constraints limit increases in microprocessor clock speeds. Understanding efficient optimization methodologies on diverse multicore designs in the context of demanding numerical methods is one of the greatest challenges faced today by […]
Mar, 9
Fast exhaustive search for polynomial systems in F2
We analyze how fast we can solve general systems of multivariate equations of various low degrees over F2; this is a well known hard problem which is important both in itself and as part of many types of algebraic cryptanalysis. Compared to the standard exhaustive search technique, our improved approach is more efficient both asymptotically […]
Mar, 9
Ocelot: a dynamic optimization framework for bulk-synchronous applications in heterogeneous systems
Ocelot is a dynamic compilation framework designed to map the explicitly data parallel execution model used by NVIDIA CUDA applications onto diverse multithreaded platforms. Ocelot includes a dynamic binary translator from Parallel Thread eXecution ISA (PTX) to many-core processors that leverages the Low Level Virtual Machine (LLVM) code generator to target x86 and other ISAs. […]
Mar, 9
Comparison of FPGA and GPU implementations of real-time stereo vision
Real-time stereo vision systems have many applications – from autonomous navigation for vehicles through surveillance to materials handling. Accurate scene interpretation depends on an ability to process high resolution images in real-time, but, although the calculations for stereo matching are basically simple, a practical system needs to evaluate at least 109 disparities every second – […]
Mar, 9
Benchmarking GPU and CPU codes for Heisenberg spin glass overrelaxation
We present a set of possible implementations for Graphics Processing Units (GPU) of the Overrelaxation technique applied to the 3D Heisenberg spin glass model. The results show that a carefully tuned code can achieve more than 100 GFlops/sec. of sustained performance and update a single spin in about 0.6 nanoseconds. A multi-hit technique that exploits […]
Mar, 9
GPU Computing Gems: Emerald Edition
Graphics Processing Units (GPUs) are designed to be parallel – having hundreds of cores versus traditional CPUs. Increasingly, you can leverage GPU power for many computationally-intense applications – not just for graphics. If you’re facing the challenge of programming systems to effectively use these massively parallel processors to achieve efficiency and performance goals, GPU Computing […]
Mar, 9
Visualization of level-of-detail meshes on the GPU
Extensive research has been carried out in multiresolution models for many decades. The tendency in recent years has been to harness the potential of GPUs to perform the level-of-detail extraction on graphics hardware. The aim of this work is to present a new level-of-detail scheme based on triangles which is both simple and efficient. In […]