7934

Posts

Jul, 5

GPU-based Assembly of Stiffness Matrices in the Parallel Multilevel Partition of Unity Method

Many real world problems can be modeled with Partial Differential Equations (PDEs). Since for many PDEs no exact solution can be found, there exists a variety of methods which give an approximate solution to those PDEs. One method which can be applied to find an approximate solution for elliptic PDEs is the Parallel Multilevel Partition […]
Jul, 5

A Massively Parallel Adaptive Fast Multipole Method on Heterogeneous Architectures

We describe a parallel fast multipole method (FMM) for highly nonuniform distributions of particles. We employ both distributed memory parallelism (via MPI) and shared memory parallelism (via OpenMP and GPU acceleration) to rapidly evaluate two-body nonoscillatory potentials in three dimensions on heterogeneous high performance computing architectures. We have performed scalability tests with up to 30 […]
Jul, 5

Navigating An Evolutionary Fast Path to Exascale – Expanded Version

The computing community is in the midst of a disruptive architectural change. The advent of manycore and heterogeneous computing nodes forces us to reconsider every aspect of the system software and application stack. To address this challenge there is a broad spectrum of approaches, which we roughly classify as either revolutionary or evolutionary. With the […]
Jul, 5

Analyzing the CUDA Applications with its Latency and Bandwidth Tolerance

The CUDA scalable parallel programming model provides readily-understood abstractions that free programmers to focus on efficient parallel algorithms. It uses a hierarchy of thread groups, shared memory, and barrier synchronization to express fine-grained and coarse-grained parallelism, using sequential C code for one thread. This paper explores the scalability of CUDA applications on systems with varying […]
Jul, 5

Interactive Quantum Chemistry: A Divide-and-Conquer ASED-MO Method

We present interactive quantum chemistry simulation at the atom superposition and electron delocalization molecular orbital (ASED-MO) level of theory. Our method is based on the divideand-conquer (D&C) approach, which we show is accurate and efficient for this non-self-consistent semiempirical theory. The method has a linear complexity in the number of atoms, scales well with the […]
Jul, 4

A Fast GPU-Based Motion Estimation Algorithm for H.264/AVC

H.264/AVC is the most recent predictive video compression standard to outperform other existing video coding standards by means of higher computational complexity. In recent years, heterogeneous computing has emerged as a cost-efficient solution for high-performance computing. In the literature, several algorithms have been proposed to accelerate video compression, but so far there have not been […]
Jul, 4

GPU Parallelization of an Unstructured Overset Grid Incompressible Navier-Stokes Solver for Moving Bodies

In pursuit of obtaining high fidelity solutions to the fluid flow equations in a short span of time, Graphics Processing Units ( GPUs ) which were originally intended for gaming applications, are currently being used to accelerate Computational Fluid Dynamics codes. With a high peak throughput of about 1 TFLOPS on a PC, GPUs seem […]
Jul, 3

The 19th IEEE International Symposium on High Performance Computer Architecture Collocated with PPoPP-2013 and CGO-2013, HPCA-2013

The International Symposium on High-Performance Computer Architecture provides a high-quality forum for scientists and engineers to present their latest research findings in this rapidly-changing field. Authors are invited to submit papers on all aspects of high-performance computer architecture. Topics of interest include, but are not limited to: * Processor, cache, and memory architectures * Parallel […]
Jul, 3

Automatic Optimization of In-Flight Memory Transactions for GPU Accelerators based on a Domain-Specific Language for Medical Imaging

An efficient memory bandwidth utilization for GPU accelerators is crucial for memory bound applications. In medical imaging, the performance of many kernels is limited by the available memory bandwidth since only a few operations are performed per pixel. For such kernels only a fraction of the compute power provided by GPU accelerators can be exploited […]
Jul, 3

Two Stage Data Mining Technique for Fast Monsoon Onset Prediction

The onset of monsoon is eagerly awaited in the Indian sub-continent as it has deep impact in the economic and social domain and hence has been monitored and studied in great depth. With the advent of satellite imagery, it’s now possible to monitor the different parameters which affect or gets affected by the monsoon in […]
Jul, 3

Parallel Processing using FPGAs and GPUs

This report includes use of parallel architectures like that of the Graphic Processing Units (GPU) for general purpose computations. It also includes, filter design using Field Programmable Gate Arrays exploiting its, inherently parallel nature. Implementation of Least Mean Square filters, which is an adaptive filter algorithm, is done using Xilinx Virtex 5 FPGA, and tested […]
Jul, 3

Using OpenCL: Programming Massively Parallel Computers

In 2011 many computer users were exploring the opportunities and the benefits of the massive parallelism offered by heterogeneous computing. In 2000 the Khronos Group, a not-for-profit industry consortium, was founded to create standard open APIs for parallel computing, graphics and dynamic media. Among them has been OpenCL, an open system for programming heterogeneous computers […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: