Posts
Jun, 9
Sparse LU Factorization for Parallel Circuit Simulation on GPU
Sparse solver has become the bottleneck of SPICE simulators. There has been few work on GPU-based sparse solver because of the high data-dependency. The strong data-dependency determines that parallel sparse LU factorization runs efficiently on shared-memory computing devices. But the number of CPU cores sharing the same memory is often limited. The state of the […]
Jun, 8
Decoupling Algorithms from Schedules for Easy Optimization of Image Processing Pipelines
Using existing programming tools, writing high-performance image processing code requires sacrificing readability, portability, and modularity. We argue that this is a consequence of conflating what computations define the algorithm, with decisions about storage and the order of computation. We refer to these latter two concerns as the schedule, including choices of tiling, fusion, recomputation vs. […]
Jun, 8
Ameliorating Memory Contention of OLAP operators on GPU Processors
Implementations of database operators on GPU processors have shown dramatic performance improvement compared to multicore-CPU implementations. GPU threads can cooperate using shared memory, which is organized in interleaved banks and is fast only when threads read and modify addresses belonging to distinct memory banks. Therefore, data processing operators implemented on a GPU, in addition to […]
Jun, 8
A Comparison of Algebraic Multigrid Preconditioners using Graphics Processing Units and Multi-Core Central Processing Units
The influence of multi-core central processing units and graphics processing units on several algebraic multigrid methods is investigated in this work. Different performance metrics traditionally employed for algebraic multigrid are reconsidered and reevaluated on these novel computing architectures. Our benchmark results show that with the use of graphics processing units for the solver phase, it […]
Jun, 8
Astrophysical Particle Simulations on Heterogeneous CPU-GPU Systems
A heterogeneous CPU-GPU node is getting popular in HPC clusters. We need to rethink algorithms and optimization techniques for such system depending on the relative performance of CPU vs. GPU. In this paper, we report a performance optimized particle simulation code "OTOO", that is based on the octree method, for heterogenous systems. Main applications of […]
Jun, 8
Parallel random variates generator for GPUs based on normal numbers
Pseudorandom number generators are required for many computational tasks, such as stochastic modelling and simulation. This paper investigates the serial CPU and parallel GPU implementation of a Linear Congruential Generator based on the binary representation of the normal number $alpha_{2,3}$. We adapted two methods of modular reduction which allowed us to perform most operations in […]
Jun, 6
DMA-Assisted, Intranode Communication in GPU Accelerated Systems
Accelerator awareness has become a pressing issue in data movement models, such as MPI, because of the rapid deployment of systems that utilize accelerators. In our previous work, we developed techniques to enhance MPI with accelerator awareness, thus allowing applications to easily and efficiently communicate data between accelerator memories. In this paper, we extend this […]
Jun, 6
Classical Mechanical Hard-Core Particles Simulated in a Rigid Enclosure using Multi-GPU Systems
Hard-core interacting particle methods are of increasing importance for simulations and game applications as well as a tool supporting animations. We develop a high accuracy numerical integration technique for managing hard-core colliding particles of various physical properties such as differing interaction species and hard-core radii using multiple Graphical Processing Unit (m-GPU) computing techniques. We report […]
Jun, 6
The Tradeoffs of Fused Memory Hierarchies in Heterogeneous Computing Architectures
With the rise of general purpose computing on graphics processing units (GPGPU), the influence from consumer markets can now be seen across the spectrum of computer architectures. In fact, many of the high-ranking Top500 HPC systems now include these accelerators. Traditionally, GPUs have connected to the CPU via the PCIe bus, which has proved to […]
Jun, 6
Relativistic Hydrodynamics on Graphic Cards
We show how to accelerate relativistic hydrodynamics simulations using graphic cards (graphic processing units, GPUs). These improvements are of highest relevance e.g. to the field of high-energetic nucleus-nucleus collisions at RHIC and LHC where (ideal and dissipative) relativistic hydrodynamics is used to calculate the evolution of hot and dense QCD matter. The results reported here […]
Jun, 6
Parallel Spherical Harmonic Transforms on heterogeneous architectures (GPUs/multi-core CPUs)
Spherical Harmonic Transforms (SHT) are at the heart of many scientific and practical applications ranging from climate modelling to cosmological observations. In many of these areas new, cutting-edge science goals have been recently proposed requiring simulations and analyses of experimental or observational data at very high resolutions and of unprecedented volumes. Both these aspects pose […]
Jun, 5
European Seminar on Computing, ESCO 2012
ESCO 2012 is the 3rd event in a successful series of interdisciplineary meetings dedicated to modern methods and practices of scientific computing. Main thematic areas include: Multiphysics coupled problems, Higher-order computational methods, Computing with Python, GPU computing, and Cloud computing. Theoretical results as well as applications are welcome. Application areas include, but are not limited […]