3282

Posts

Mar, 10

Real-Time Crowd Rendering and Interactions on GPU

The simulation of large crowds of characters is important in many fields of virtual reality, as they can increase the credibility of the virtual environments. Rendering large crowd of characters requires a great mount of computational power. To increase the efficiency for this render, we propose a GPU-based crowd rendering method. We present a novel […]
Mar, 10

A GPU-enhanced cluster for accelerated FMS

The forces modeling and simulation (FMS) community has often been hampered by constraints in computing: not enough resolution, not enough entities, not enough behavioral variants. High performance computing can ameliorate those constraints. The use of Linux clusters is one path to higher performance; the use of graphics processing units (GPU) as accelerators is another. Merging […]
Mar, 10

Realtime phase-based optical flow on the GPU

Phase-based optical flow algorithms are characterized by high precision and robustness, but also by high computational requirements. Using the CUDA platform, we have implemented a phase-based algorithm that maps exceptionally well on the GPUs architecture. This optical flow algorithm revolves around a reliability measure that evaluates the consistency of phase information over time. By exploiting […]
Mar, 10

An Efficient SAR Processor Based on GPU via CUDA

A novel and efficient Synthetic Aperture Radar (SAR) processor is introduced in this paper. This new processor is implemented on the Graphics Processing Unit (GPU). GPU is traditionally used for graphics rendering, but in recent years, it has rapidly evolved as a highly-parallel processor with tremendous computation capability and ultra-high memory bandwidth. The algorithm of […]
Mar, 10

Using a GPU to accelerate die and mold fabrication

The authors present a GPU-based method for generating and verifying cutter paths for numerically controlled milling. A CAM system based on this technology is now employed in production at Mazda Motor Corporation for manufacturing stamping dies. This system can compute cutter paths more than 20 times faster than previous methods
Mar, 10

A Predictive Shutdown Technique for GPU Shader Processors

As technology continues to shrink, reducing leakage is critical to achieve energy efficiency. Previous works on low-power GPU (graphics processing unit) focus on techniques for dynamic power reduction, such as DVFS (Dynamic Voltage/Frequency Scaling) and clock gating. In this paper, we explore the potential of adopting architecture-level power gating techniques for leakage reduction on GPU. […]
Mar, 10

Performance analysis and optimization strategies for a D3Q19 lattice Boltzmann kernel on nVIDIA GPUs using CUDA

This paper presents implementation strategies and optimization approaches for a D3Q19 lattice Boltzmann flow solver on nVIDIA graphics processing units (GPUs). Using the STREAM benchmarks we demonstrate the GPU parallelization approach and obtain an upper limit for the flow solver performance. We discuss the GPU-specific implementation of the solver with a focus on memory alignment […]
Mar, 9

Gyrokinetic Particle-in-Cell Optimization on Emerging Multi- and Manycore Platforms

The next decade of high-performance computing (HPC) systems will see a rapid evolution and divergence of multi- and manycore architectures as power and cooling constraints limit increases in microprocessor clock speeds. Understanding efficient optimization methodologies on diverse multicore designs in the context of demanding numerical methods is one of the greatest challenges faced today by […]
Mar, 9

Fast exhaustive search for polynomial systems in F2

We analyze how fast we can solve general systems of multivariate equations of various low degrees over F2; this is a well known hard problem which is important both in itself and as part of many types of algebraic cryptanalysis. Compared to the standard exhaustive search technique, our improved approach is more efficient both asymptotically […]
Mar, 9

Ocelot: a dynamic optimization framework for bulk-synchronous applications in heterogeneous systems

Ocelot is a dynamic compilation framework designed to map the explicitly data parallel execution model used by NVIDIA CUDA applications onto diverse multithreaded platforms. Ocelot includes a dynamic binary translator from Parallel Thread eXecution ISA (PTX) to many-core processors that leverages the Low Level Virtual Machine (LLVM) code generator to target x86 and other ISAs. […]
Mar, 9

Comparison of FPGA and GPU implementations of real-time stereo vision

Real-time stereo vision systems have many applications – from autonomous navigation for vehicles through surveillance to materials handling. Accurate scene interpretation depends on an ability to process high resolution images in real-time, but, although the calculations for stereo matching are basically simple, a practical system needs to evaluate at least 109 disparities every second – […]
Mar, 9

Benchmarking GPU and CPU codes for Heisenberg spin glass overrelaxation

We present a set of possible implementations for Graphics Processing Units (GPU) of the Overrelaxation technique applied to the 3D Heisenberg spin glass model. The results show that a carefully tuned code can achieve more than 100 GFlops/sec. of sustained performance and update a single spin in about 0.6 nanoseconds. A multi-hit technique that exploits […]

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us:

contact@hpgu.org