Posts
Mar, 10
Performance analysis and optimization strategies for a D3Q19 lattice Boltzmann kernel on nVIDIA GPUs using CUDA
This paper presents implementation strategies and optimization approaches for a D3Q19 lattice Boltzmann flow solver on nVIDIA graphics processing units (GPUs). Using the STREAM benchmarks we demonstrate the GPU parallelization approach and obtain an upper limit for the flow solver performance. We discuss the GPU-specific implementation of the solver with a focus on memory alignment […]
Mar, 9
Gyrokinetic Particle-in-Cell Optimization on Emerging Multi- and Manycore Platforms
The next decade of high-performance computing (HPC) systems will see a rapid evolution and divergence of multi- and manycore architectures as power and cooling constraints limit increases in microprocessor clock speeds. Understanding efficient optimization methodologies on diverse multicore designs in the context of demanding numerical methods is one of the greatest challenges faced today by […]
Mar, 9
Fast exhaustive search for polynomial systems in F2
We analyze how fast we can solve general systems of multivariate equations of various low degrees over F2; this is a well known hard problem which is important both in itself and as part of many types of algebraic cryptanalysis. Compared to the standard exhaustive search technique, our improved approach is more efficient both asymptotically […]
Mar, 9
Ocelot: a dynamic optimization framework for bulk-synchronous applications in heterogeneous systems
Ocelot is a dynamic compilation framework designed to map the explicitly data parallel execution model used by NVIDIA CUDA applications onto diverse multithreaded platforms. Ocelot includes a dynamic binary translator from Parallel Thread eXecution ISA (PTX) to many-core processors that leverages the Low Level Virtual Machine (LLVM) code generator to target x86 and other ISAs. […]
Mar, 9
Comparison of FPGA and GPU implementations of real-time stereo vision
Real-time stereo vision systems have many applications – from autonomous navigation for vehicles through surveillance to materials handling. Accurate scene interpretation depends on an ability to process high resolution images in real-time, but, although the calculations for stereo matching are basically simple, a practical system needs to evaluate at least 109 disparities every second – […]
Mar, 9
Benchmarking GPU and CPU codes for Heisenberg spin glass overrelaxation
We present a set of possible implementations for Graphics Processing Units (GPU) of the Overrelaxation technique applied to the 3D Heisenberg spin glass model. The results show that a carefully tuned code can achieve more than 100 GFlops/sec. of sustained performance and update a single spin in about 0.6 nanoseconds. A multi-hit technique that exploits […]
Mar, 9
GPU Computing Gems: Emerald Edition
Graphics Processing Units (GPUs) are designed to be parallel – having hundreds of cores versus traditional CPUs. Increasingly, you can leverage GPU power for many computationally-intense applications – not just for graphics. If you’re facing the challenge of programming systems to effectively use these massively parallel processors to achieve efficiency and performance goals, GPU Computing […]
Mar, 9
Visualization of level-of-detail meshes on the GPU
Extensive research has been carried out in multiresolution models for many decades. The tendency in recent years has been to harness the potential of GPUs to perform the level-of-detail extraction on graphics hardware. The aim of this work is to present a new level-of-detail scheme based on triangles which is both simple and efficient. In […]
Mar, 9
GPU-RMAP: Accelerating Short-Read Mapping on Graphics Processors
Next-generation, high-throughput sequencers are now capable of producing hundreds of billions of short sequences (reads) in a single day. The task of accurately mapping the reads back to a reference genome is of particular importance because it is used in several other biological applications, e.g., genome re-sequencing, DNA methylation, and ChiP sequencing. On a personal […]
Mar, 9
Classical Simulation of Quantum Adiabatic Algorithms using Mathematica on GPUs
In this paper we present a simulation environment enhanced with parallel processing which can be used on personal computers, based on a high-level user interface developed on Mathematicacopyright which is connected to C++ code in order to make our platform capable of communicating with a Graphics Processing Unit. We introduce the reader to the behavior […]
Mar, 8
Using common graphics hardware for multi-agent traffic simulation with CUDA
Today’s graphics processing units (GPU) have tremendous resources when it comes to raw computing power. The simulation of large groups of agents in transport simulation has a huge demand of computation time. Therefore it seems reasonable to try to harvest this computing power for traffic simulation. Unfortunately simulating a network of traffic is inherently connected […]