Posts
Jul, 17
Interactively Simulating Fluid based on SPH and CUDA
In this paper, we propose a novel method of interactive fluid simulating based on SPH, and implement it on CUDA (Compute Unified Device Architecture). Firstly we use SPH (Smoothed Particle Hydrodynamics) theory to simulate the motion of fluids. Secondly we propose an interactive method between fluid and rigid objects. We treat the rigid objects as […]
Jul, 17
CUSIMANN: An optimized simulated annealing software for GPUs
CUSIMANN (CUDA SIMULATED ANNEALING) is a free/open-source library for global optimization that provides a parallel implementation of the simulated annealing algorithm in CUDA.
Jul, 17
Scalable Molecular Dynamics Simulation Using FPGAs and Multicore Processors
While Molecular Dynamics Simulation (MD) uses a large fraction of the world’s High Performance Compute cycles, the modeling of many physical phenomena remains far out of reach. Improving the cost-effectiveness of MD has therefore received much attention, especially in using accelerators or modifying the computation itself. While both approaches have demonstrated great potential, scalability has […]
Jul, 17
Multicore and Manycore Algorithms for Octrees
Octrees and compressed octrees are frequently used to represent data in an hierarchical form for high performance computing, graphics and database applications. Applications like N-body problems require building octrees multiple times. Therefore, efficient construction of octrees is critical to the efficiency of the entire applications. With ever increasing data size, there is a requirement to […]
Jul, 16
Optimizing MapReduce for GPUs with effective shared memory usage
Accelerators and heterogeneous architectures in general, and GPUs in particular, have recently emerged as major players in high performance computing. For many classes of applications, MapReduce has emerged as the framework for easing parallel programming and improving programmer productivity. There have already been several efforts on implementing MapReduce on GPUs. In this paper, we propose […]
Jul, 16
Sparse Matrix-Vector Multiplication on NVIDIA GPU
In this paper, we present our work on developing a new matrix format and a new sparse matrix-vector multiplication algorithm. The matrix format is HEC, which is a hybrid format. This matrix format is efficient for sparse matrix-vector multiplication and is friendly to preconditioner. Numerical experiments show that our sparse matrix-vector multiplication algorithm is efficient […]
Jul, 16
Sparse Matrix Matrix Multiplication on Hybrid CPU+GPU Platforms
Sparse matrix-sparse/dense matrix multiplications, spgemm and csrmm, among other applications find usage in various matrix formulations of graph problems. GPU based supercomputers are presently experiencing severe performance issues on the Graph-500 benchmarks, a new HPC benchmark suite focusing on graph algorithms. Considering the difficulties in executing graph problems and the duality between graphs and matrices, […]
Jul, 16
A Yoke of Oxen and a Thousand Chickens for Heavy Lifting Graph Processing
Large, real-world graphs are famously difficult to process efficiently. Not only they have a large memory footprint but most graph processing algorithms entail memory access patterns with poor locality, data-dependent parallelism, and a low compute-to-memory access ratio. Additionally, most real-world graphs have a low diameter and a highly heterogeneous node degree distribution. Partitioning these graphs […]
Jul, 16
Distributed OpenCL: a platform for distributed, heterogeneous computing for domain scientists
It is possible to purchase, for as little as $10,000, a cluster of computers with the capability to rival the supercomputers of only a few years ago. Now, users that have little to no experience developing distributed applications or managing a cluster are in a position to do so. To allow domain scientists to effectively […]
Jul, 15
Coupling between Meshless FEM Modeling and Rendering on GPU for Real-time Physically-based Volumetric Deformation
For real-time rendering of physically-based volumetric deformation, a meshless finite element method (FEM) is proposed and implemented on the new-generation Graphics Processing Unit (GPU). A tightly coupled deformation and rendering pipeline is defined for seamless modeling and rendering: First, the meshless FEM model exploits the vertex shader stage and the transform feedback mechanism of the […]
Jul, 15
ab-Stream: A Framework for programming Many-core
The common approach to program many-core processor is to write processor-specific code with low level APIs for different processors, which could achieve good performance but would result in serious portability issues: programmers are required to write a specific version code for target architecture. Therefore, we present ab-Stream, an extensible framework for programming many-threaded processor based […]
Jul, 15
Implementing a Code Generator for Fast Matrix Multiplication in OpenCL on the GPU
This paper presents results of an implementation of code generator for fast general matrix multiply (GEMM) kernels. When a set of parameters is given, the code generator produces the corresponding GEMM kernel written in OpenCL. The produced kernels are optimized for high-performance implementation on GPUs from AMD. Access latencies to GPU global memory is the […]