3284

Posts

Mar, 10

Skeletal rigid skinning with blending patches on the GPU

In this paper, we present a novel skeletal rigid skinning approach. First, we introduce a skeleton extraction technique that produces refined skeletons appropriate for animation from decomposed solid models. Then, to avoid the artifacts generated in previous rigid skinning approaches and the associated high training costs, we develop an efficient and robust rigid skinning technique […]
Mar, 10

Implementing Ultrasound Beamforming on the GPU using CUDA

Todays ultrasound equipment consists mainly of a PC that is attached to several large cards that process the received signals in hardware. These cards take up a lot of space and are costly to develop. As processing power in PC’s increase it is possible move some of this signal processing from specialized hardware to standard […]
Mar, 10

Real-Time Crowd Rendering and Interactions on GPU

The simulation of large crowds of characters is important in many fields of virtual reality, as they can increase the credibility of the virtual environments. Rendering large crowd of characters requires a great mount of computational power. To increase the efficiency for this render, we propose a GPU-based crowd rendering method. We present a novel […]
Mar, 10

A GPU-enhanced cluster for accelerated FMS

The forces modeling and simulation (FMS) community has often been hampered by constraints in computing: not enough resolution, not enough entities, not enough behavioral variants. High performance computing can ameliorate those constraints. The use of Linux clusters is one path to higher performance; the use of graphics processing units (GPU) as accelerators is another. Merging […]
Mar, 10

Realtime phase-based optical flow on the GPU

Phase-based optical flow algorithms are characterized by high precision and robustness, but also by high computational requirements. Using the CUDA platform, we have implemented a phase-based algorithm that maps exceptionally well on the GPUs architecture. This optical flow algorithm revolves around a reliability measure that evaluates the consistency of phase information over time. By exploiting […]
Mar, 10

An Efficient SAR Processor Based on GPU via CUDA

A novel and efficient Synthetic Aperture Radar (SAR) processor is introduced in this paper. This new processor is implemented on the Graphics Processing Unit (GPU). GPU is traditionally used for graphics rendering, but in recent years, it has rapidly evolved as a highly-parallel processor with tremendous computation capability and ultra-high memory bandwidth. The algorithm of […]
Mar, 10

Using a GPU to accelerate die and mold fabrication

The authors present a GPU-based method for generating and verifying cutter paths for numerically controlled milling. A CAM system based on this technology is now employed in production at Mazda Motor Corporation for manufacturing stamping dies. This system can compute cutter paths more than 20 times faster than previous methods
Mar, 10

A Predictive Shutdown Technique for GPU Shader Processors

As technology continues to shrink, reducing leakage is critical to achieve energy efficiency. Previous works on low-power GPU (graphics processing unit) focus on techniques for dynamic power reduction, such as DVFS (Dynamic Voltage/Frequency Scaling) and clock gating. In this paper, we explore the potential of adopting architecture-level power gating techniques for leakage reduction on GPU. […]
Mar, 10

Performance analysis and optimization strategies for a D3Q19 lattice Boltzmann kernel on nVIDIA GPUs using CUDA

This paper presents implementation strategies and optimization approaches for a D3Q19 lattice Boltzmann flow solver on nVIDIA graphics processing units (GPUs). Using the STREAM benchmarks we demonstrate the GPU parallelization approach and obtain an upper limit for the flow solver performance. We discuss the GPU-specific implementation of the solver with a focus on memory alignment […]
Mar, 9

Gyrokinetic Particle-in-Cell Optimization on Emerging Multi- and Manycore Platforms

The next decade of high-performance computing (HPC) systems will see a rapid evolution and divergence of multi- and manycore architectures as power and cooling constraints limit increases in microprocessor clock speeds. Understanding efficient optimization methodologies on diverse multicore designs in the context of demanding numerical methods is one of the greatest challenges faced today by […]
Mar, 9

Fast exhaustive search for polynomial systems in F2

We analyze how fast we can solve general systems of multivariate equations of various low degrees over F2; this is a well known hard problem which is important both in itself and as part of many types of algebraic cryptanalysis. Compared to the standard exhaustive search technique, our improved approach is more efficient both asymptotically […]
Mar, 9

Ocelot: a dynamic optimization framework for bulk-synchronous applications in heterogeneous systems

Ocelot is a dynamic compilation framework designed to map the explicitly data parallel execution model used by NVIDIA CUDA applications onto diverse multithreaded platforms. Ocelot includes a dynamic binary translator from Parallel Thread eXecution ISA (PTX) to many-core processors that leverages the Low Level Virtual Machine (LLVM) code generator to target x86 and other ISAs. […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: