Posts
Nov, 8
Graphics Processing Unit Utilization in Circuit Simulation
Graphics processing units (GPU) of today include hundreds of multi-threaded, multicore processors and a complex, high-bandwidth memory architecture, making them a good alternative to speed up general-purpose parallel computation where large data quantities are processed with same functions. Some successful applications of GPU computation have also been introduced in the field of circuit simulation. The […]
Nov, 8
20th Euromicro International Conference on Parallel, Distributed and Network-Based Computing, PDP 2012
The Special Session on GPU Computing and Hybrid Computing aims at providing a forum for scientific researchers and engineers on hot topics related to GPU computing and hybrid computing with special emphasis on applications, performance analysis, programming models and mechanisms for mapping codes. Topics: GPU computing, multi GPU processing, hybrid computing; Programming models, programming frameworks, […]
Nov, 8
Innovative Parallel Computing: Foundations & Applications of GPU, Manycore, and Heterogeneous Systems, InPar 2012
InPar 2012 is co-located with NVidia’s GPU Technology Conference. This new conference provides a first-tier academic venue for peer-reviewed publications in the emerging fields of parallel computing, encompassing the topics of GPU computing, manycore computing, and heterogeneous computing. InPar has dual focus on “Foundations” — the fundamental advances in parallel computing itself and “Applications” — […]
Nov, 8
Performance analysis of a hybrid MPI/CUDA implementation of the NASLU benchmark
We present the performance analysis of a port of the LU benchmark from the NAS Parallel Benchmark (NPB) suite to NVIDIA’s Compute Unified Device Architecture (CUDA), and report on the optimisation efforts employed to take advantage of this platform. Execution times are reported for several different GPUs, ranging from low-end consumergrade products to high-end HPC-grade […]
Nov, 8
A Class of Hybrid LAPACK Algorithms for Multicore and GPU Architectures
Three out of the top four supercomputers in the November 2010 TOP500 list of the world’s most powerful supercomputers use NVIDIA GPUs to accelerate computations. Ninety-five systems from the list are using processors with six or more cores. Three-hundred-sixty-five systems use quad-core processor-based systems. Thirty-seven systems are using dual-core processors. The large-scale enabling of hybrid […]
Nov, 8
High performance massively parallel direct N-body simulations on large GPU clusters
We present direct astrophysical N-body simulations with up to six million bodies using our parallel MPI/CUDA code on large GPU clusters in China, with different kinds of GPU hardware. These clusters are directly linked under the Chinese Academy of Sciences special GPU cluster program. We reach about one third of the peak GPU performance for […]
Nov, 8
The Infrared behavior of SU(3) Nf=12 gauge theory -about the existence of conformal fixed point-
Incorporated with twisted boundary condition, Polyakov loop correlators can give a definition of the renormalized coupling. We employ this scheme for the step scaling method (with step size s = 2) in the search of conformal fixed point of SU(3) gauge theory with 12 massless flavors. Staggered fermion and plaquette gauge action are used in […]
Nov, 8
Speculative Parallel Evaluation Of Classification Trees On GPGPU Compute Engines
We examine the problem of optimizing classification tree evaluation for on-line and real-time applications by using GPUs. Looking at trees with continuous attributes often used in image segmentation, we first put the existing algorithms for serial and data-parallel evaluation on solid footings. We then introduce a speculative parallel algorithm designed for single instruction, multiple data […]
Nov, 7
Flocking Implementation for the Blender Game Engine
In this thesis, we discuss the development of a new Boids system that simulates flocking behavior inside the Blender Game Engine and within the framework of the Real-Time Particles System (RTPS) library developed by Ian Johnson. The collective behavior of Boids is characterized as an emergent behavior caused by following three steering behaviors: separation, alignment, […]
Nov, 7
High-Level Design for FPGA-based Multiprocessor Accelerators
Field programmable gate arrays (FPGAs) have the potential to accelerate scientific computing applications due to their highly parallel architecture. However, for programming these architectures efficiently, hardware description languages (HDL), such as Verilog or VHDL, have to be used. Many application developers are not familiar with these HDL languages, because they traditionally develop their applications using […]
Nov, 7
GPUinspiral – a low-latency, high-performance implementation of the matched-filter gravitational wave search algorithm
A very high performance search pipeline has been developed for the search for gravitational wave signals originating from coalescing compact binary systems in the M<35 MSUN mass range. The goal of this research is to provide a solution to some of the so far computationally unfeasible data analysis methods such as for example the filtering […]
Nov, 7
MELT-a Translated Domain Specific Language Embedded in the GCC Compiler
The GCC free compiler is a very large software, compiling source in several languages for many targets on various systems. It can be extended by plugins, which may take advantage of its power to provide extra specific functionality (warnings, optimizations, source refactoring or navigation) by processing various GCC internal representations (Gimple, Tree, …). Writing plugins […]