Posts
Nov, 11
Explorations of the Viability of ARM and Xeon Phi for Physics Processing
We report on our investigations into the viability of the ARM processor and the Intel Xeon Phi co-processor for scientific computing. We describe our experience porting software to these processors and running benchmarks using real physics applications to explore the potential of these processors for production physics processing.
Nov, 11
A High Performance Random Number Generator Using Heterogeneous Computing Platform
The power of high performance computing (HPC) heavily depends on the ability to efficiently enhancing huge amounts of parallelism. Random numbers or pseudo random numbers are very important for the efficient implementation for stochastic algorithms. Multi-core CPU and many-core Graphic Processing Units (GPUs) are conductive accelerator to produce the countless random numbers. Nevertheless, GPU does […]
Nov, 11
First Steps Towards More Numerical Reproducibility
Questions whether numerical simulation is reproducible or not have been reported in several sensitive applications. Numerical reproducibility failure mainly comes from the finite precision of computer arithmetic. Results of floating-point computation depends on the computer arithmetic precision and on the order of arithmetic operations. Massive parallel HPC which merges, for instance, many-core CPU and GPU, […]
Nov, 10
Optimization of real-time ultrasound PCIe data streaming and OpenCL processing for SAFT imaging
Our goal is to develop a complete ultrasound platform based on real-time SAFT (Synthetic Aperture Focusing Technique) GPU processing. We are planning to integrate all the ultrasound modules and processing resources (GPU) in a single rack enclosure with the PCIe switch fabric backplane. The first developed module (RX64) provides acquisition and streaming of 64 ultrasound […]
Nov, 10
Toward Better Computation Models for Modern Machines
Modern computers are not random access machines (RAMs). They have a memory hierarchy, multiple cores, and a virtual memory. We address the computational cost of the address translation in the virtual memory and difficulties in design of parallel algorithms on modern many-core machines. Starting point for our work on virtual memory is the observation that […]
Nov, 10
Performance Analysis of Sobel Edge Detection Filter on GPU using CUDA & OpenGL
CUDA(Compute Unified Device Architecture) is a novel technology of general-purpose computing on the GPU, which makes users develop general GPU (Graphics Processing Unit) programs easily. GPUs are emerging as platform of choice for Parallel High Performance Computing. GPUs are good at data intensive parallel processing with availability of software development platforms such as CUDA (developed […]
Nov, 10
Implementation of Spectral Angle Mapper (SAM) Algorithm on a Graphic processing unit (GPU)
The Need for Hyper spectral Images for Exploration of Oil and Other Minerals are so massive. We can tap the high computational power available now for faster tracking of those minerals underneath. In this paper, we Implement an Algorithm called Spectral angle mapper(SAM) using compute unified device architecture(CUDA) framework on a GPU. The SAM algorithm […]
Nov, 10
Towards a Portable and Future-proof Particle-in-Cell Plasma Physics Code
We present the first reported OpenCL implementation of EPOCH3D, an extensible particle-in-cell plasma physics code developed at the University of Warwick. We document the challenges and successes of this porting effort, and compare the performance of our implementation executing on a wide variety of hardware from multiple vendors. The focus of our work is on […]
Nov, 8
Tiled QR Decomposition and Its Optimization on CPU and GPU Computing System
There can be many types of heterogeneous computing systems, and the most useful one is the CPU and GPU computing system. In this system, we try to run QR decomposition, which expresses a standard real matrix as a production of two matrices. For a tiled QR decomposition algorithm, which is a parallelized version of QR […]
Nov, 8
GPU-Based Space-Time Adaptive Processing (STAP) for Radar
Space-time adaptive processing (STAP) utilizes a two-dimensional adaptive filter to detect targets within a radar data set with speeds similar to the background clutter. While adaptively optimal solutions exist, they are prohibitively computationally intensive. Thus, researchers have developed alternative algorithms with nearly optimal filtering performance and greatly reduced computational intensity. While such alternatives reduce the […]
Nov, 8
Accelerating a Novel Particle-based Fluid Simulation on the GPU
Stochastic Rotation Dynamics (SRD) is a novel particle-based simulation method that can be used to model complex fluids [1], [2], such as binary and ternary mixtures [3], and polymer solutions [4]-[6], in either two or three dimensions. Although SRD is efficient compared to traditional methods, it is still computationally expensive for large system sizes, e.g. […]
Nov, 8
Architectural improvements and 28 nm FPGA implementation of the APEnet+ 3D Torus network for hybrid HPC systems
Modern Graphics Processing Units (GPUs) are now considered accelerators for general purpose computation. A tight interaction between the GPU and the interconnection network is the strategy to express the full potential on capability computing of a multi-GPU system on large HPC clusters; that is the reason why an efficient and scalable interconnect is a key […]

