10292

Posts

Aug, 8

GPU implementation of a shell element structural solver aimed at fluid-structure interaction problems

The study of thin structures is very common nowadays and useful in different fields. An important example is the analysis of sail dynamics. In this context, accurate simulations of the interaction between the sail and the wind are also required. However, this kind of fluid-structure interaction problems are very computationally expensive. First objective of this […]
Aug, 8

An Energy Optimization of a GPU Application by Grid Design Space Exploration

Power and energy consumptions are also becoming important design criteria. Consequently, software designs have to consider the power/energy consumptions together with performance when they are developing software. In this paper, we explore a design space exploration with a commercial GPU: nVidia GTX 660 for investigating the best configuration of a kernel grid structure in a […]
Aug, 8

Levy Flights for Particle Swarm Optimisation Algorithms on Graphical Processing Units

Particle Swarm Optimisation (PSO) is a powerful algorithm for space search problems such as parametric optimisation. Particles with Levy flights have a long-tailed probability of outlier jumps in the problem space that provide a good compromise between local space exploration and local minima avoidance. Generating many particles and their trajectories with Levy random deviates is […]
Aug, 8

Compiler-based Data Prefetching and Streaming Non-temporal Store Generation for the Intel Xeon Phi Coprocessor

The Intel Xeon Phi coprocessor has software prefetching instructions to hide memory latencies and special store instructions to save bandwidth on streaming nontemporal store operations. In this work, we provide details on compiler-based generation of these instructions and evaluate their impact on the performance of the Intel Xeon Phi coprocessor using a wide range of […]
Aug, 8

Improving the GPU space of computation under triangular domain problems

There is a stage in the GPU computing pipeline where a grid of thread-blocks is mapped to the problem domain. Normally, this grid is a k-dimensional bounding box that covers a k-dimensional problem no matter its shape. Threads that fall inside the problem domain perform computations, otherwise they are discarded at runtime. For problems with […]
Aug, 7

Exploring Microcontrollers in GPUs

Recent graphics processing units (GPUs) integrate wimpy microcontrollers on a chip. They are often used to execute firmware code configuring the functional units of GPUs. This paper opens up the programming of these microcontrollers and explores how to utilize them for GPU resource management. Our prototype system provides a compiler suite for NVIDIA’s GPU microcontrollers […]
Aug, 7

Finite Difference Time-Domain Modelling of Metamaterials: GPU Implementation of Cylindrical Cloak

Finite difference time-domain (FDTD) technique can be used to model metamaterials by treating them as dispersive material. Drude or Lorentz model can be incorporated into the standard FDTD algorithm for modelling negative permittivity and permeability. FDTD algorithm is readily parallelisable and can take advantage of GPU acceleration to achieve speed-ups of 5x-50x depending on hardware […]
Aug, 7

Fast Morphological Image Processing on GPU using CUDA

A mathematical morphology is used as a tool for extracting image components that are useful in the representation and description of region shape. The mathematical morphology operations of dilation, erosion, opening, and closing are important building blocks of many other image processing algorithms. The data parallel programming provides an opportunity for performance acceleration using highly […]
Aug, 7

GPU Accelerated Pattern Matching Algorithm for DNA Sequences to Detect Cancer using CUDA

Cancer is one of the severe diseases causing one in eight deaths worldwide. It can be cured if detected at the very first stage where the cancer cells stay fixed in their area. In stage two it will start to spread. When it spread to muscles enters in third stage. It may cause organ failure. […]
Aug, 7

Automatic Skeleton-Based Compilation through Integration with an Algorithm Classification

This paper presents a technique to fully automatically generate efficient and readable code for parallel processors. We base our approach on skeleton-based compilation and "algorithmic species", an algorithm classification of program code. We use a tool to automatically annotate C code with species information where possible. The annotated program code is subsequently fed into the […]
Aug, 6

2D Triangulation of Polygons on CUDA

General Purpose computing on Graphics Processor Units (GPGPU) brings massively parallel computing (hundreds of compute cores) to the desktop at a reasonable cost, but requires that algorithms be carefully designed to take advantage of this power. The present work explores the possibilities of CUDA (NVIDIA Compute Unified Device Architecture) using GPGPU approach for 2D Triangulation […]
Aug, 6

Portable Parallel Kernels for High-Speed Beamforming in Synthetic Aperture Ultrasound Imaging

In medical ultrasound, synthetic aperture (SA) imaging is well-considered as a novel image formation technique for achieving superior resolution than that offered by existing scanners. However, its intensive processing load is known to be a challenging factor. To address such a computational demand, this paper proposes a new parallel approach based on the design of […]

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us: