10910

Posts

Nov, 11

Explorations of the Viability of ARM and Xeon Phi for Physics Processing

We report on our investigations into the viability of the ARM processor and the Intel Xeon Phi co-processor for scientific computing. We describe our experience porting software to these processors and running benchmarks using real physics applications to explore the potential of these processors for production physics processing.
Nov, 11

A High Performance Random Number Generator Using Heterogeneous Computing Platform

The power of high performance computing (HPC) heavily depends on the ability to efficiently enhancing huge amounts of parallelism. Random numbers or pseudo random numbers are very important for the efficient implementation for stochastic algorithms. Multi-core CPU and many-core Graphic Processing Units (GPUs) are conductive accelerator to produce the countless random numbers. Nevertheless, GPU does […]
Nov, 11

First Steps Towards More Numerical Reproducibility

Questions whether numerical simulation is reproducible or not have been reported in several sensitive applications. Numerical reproducibility failure mainly comes from the finite precision of computer arithmetic. Results of floating-point computation depends on the computer arithmetic precision and on the order of arithmetic operations. Massive parallel HPC which merges, for instance, many-core CPU and GPU, […]
Nov, 10

Optimization of real-time ultrasound PCIe data streaming and OpenCL processing for SAFT imaging

Our goal is to develop a complete ultrasound platform based on real-time SAFT (Synthetic Aperture Focusing Technique) GPU processing. We are planning to integrate all the ultrasound modules and processing resources (GPU) in a single rack enclosure with the PCIe switch fabric backplane. The first developed module (RX64) provides acquisition and streaming of 64 ultrasound […]
Nov, 10

Toward Better Computation Models for Modern Machines

Modern computers are not random access machines (RAMs). They have a memory hierarchy, multiple cores, and a virtual memory. We address the computational cost of the address translation in the virtual memory and difficulties in design of parallel algorithms on modern many-core machines. Starting point for our work on virtual memory is the observation that […]
Nov, 10

Performance Analysis of Sobel Edge Detection Filter on GPU using CUDA & OpenGL

CUDA(Compute Unified Device Architecture) is a novel technology of general-purpose computing on the GPU, which makes users develop general GPU (Graphics Processing Unit) programs easily. GPUs are emerging as platform of choice for Parallel High Performance Computing. GPUs are good at data intensive parallel processing with availability of software development platforms such as CUDA (developed […]
Nov, 10

Implementation of Spectral Angle Mapper (SAM) Algorithm on a Graphic processing unit (GPU)

The Need for Hyper spectral Images for Exploration of Oil and Other Minerals are so massive. We can tap the high computational power available now for faster tracking of those minerals underneath. In this paper, we Implement an Algorithm called Spectral angle mapper(SAM) using compute unified device architecture(CUDA) framework on a GPU. The SAM algorithm […]
Nov, 10

Towards a Portable and Future-proof Particle-in-Cell Plasma Physics Code

We present the first reported OpenCL implementation of EPOCH3D, an extensible particle-in-cell plasma physics code developed at the University of Warwick. We document the challenges and successes of this porting effort, and compare the performance of our implementation executing on a wide variety of hardware from multiple vendors. The focus of our work is on […]
Nov, 8

Tiled QR Decomposition and Its Optimization on CPU and GPU Computing System

There can be many types of heterogeneous computing systems, and the most useful one is the CPU and GPU computing system. In this system, we try to run QR decomposition, which expresses a standard real matrix as a production of two matrices. For a tiled QR decomposition algorithm, which is a parallelized version of QR […]
Nov, 8

GPU-Based Space-Time Adaptive Processing (STAP) for Radar

Space-time adaptive processing (STAP) utilizes a two-dimensional adaptive filter to detect targets within a radar data set with speeds similar to the background clutter. While adaptively optimal solutions exist, they are prohibitively computationally intensive. Thus, researchers have developed alternative algorithms with nearly optimal filtering performance and greatly reduced computational intensity. While such alternatives reduce the […]
Nov, 8

Accelerating a Novel Particle-based Fluid Simulation on the GPU

Stochastic Rotation Dynamics (SRD) is a novel particle-based simulation method that can be used to model complex fluids [1], [2], such as binary and ternary mixtures [3], and polymer solutions [4]-[6], in either two or three dimensions. Although SRD is efficient compared to traditional methods, it is still computationally expensive for large system sizes, e.g. […]
Nov, 8

Architectural improvements and 28 nm FPGA implementation of the APEnet+ 3D Torus network for hybrid HPC systems

Modern Graphics Processing Units (GPUs) are now considered accelerators for general purpose computation. A tight interaction between the GPU and the interconnection network is the strategy to express the full potential on capability computing of a multi-GPU system on large HPC clusters; that is the reason why an efficient and scalable interconnect is a key […]

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us: