8847

Posts

Jan, 11

Parallel Computing of Discrete Element Method on GPU

We investigate applicability of GPU to DEM. NVIDIA’s code obtained superior performance than CPU in computational time. A model of contact forces in NVIDIA’s code is too simple for practical use. We modify this model by replacing it with the practical model. The simulation shows that the practical model obtains the computing speed 6 times […]
Jan, 11

Parallel Algorithms for Constructing Data Structures for Fast Multipole Methods

We present efficient algorithms to build data structures and the lists needed for fast multipole methods. The algorithms are capable of being efficiently implemented on both serial, data parallel GPU and on distributed architectures. With these algorithms it is possible to map the FMM efficiently on to the GPU or distributed heterogeneous CPU-GPU systems. Further, […]
Jan, 10

On Graphs, GPUs, and Blind Dating: A Workload to Processor Matchmaking Quest

Graph processing has gained renewed attention. The increasing large scale and wealth of connected data, such as those accrued by social network applications, demand the design of new techniques and platforms to efficiently derive actionable information from large scale graphs. Hybrid systems that host processing units optimized for both fast sequential processing and bulk processing […]
Jan, 10

GPU-Based Super-union for Minkowski Sum

We present an efficient and robust algorithm to approximate the 3D Minkowski sum of two arbitrary polyhedra on Graphics Processing Unit (GPU). Our algorithm makes use of the idea of super-union, in which we decompose the two polyhedra into convex pieces as usual, but the way we perform pairwise convex Minkowski sum and merge the […]
Jan, 10

Multi-Platform LU-Decomposition Solution in OpenCL

The purpose of our project was to write a fast OpenCL LU-Decomposition (LUD) solution for the Intel/AMD CPU/GPU and Altera’s FPGA and record the amount of recoding required to optimize the algorithm for these platforms. LUD is the mathematical operation which factors a given matrix into the multiplication of a lower triangular and an upper […]
Jan, 10

A Fast and Accurate GHT Implementation on CUDA

Generalized Hough Transform (GHT) is a well known but seldom used algorithm for object detection. The merit of this algorithm is its ability to detect object location and its pose accurately. However, this algorithm has a huge drawback of high memory and extensive computational requirement. As a result, usage of this algorithm for object detection […]
Jan, 9

An implementation for quad-tree based solid object coloring using CUDA

We propose an implementation for quad-tree based solid object coloring using Compute Unified Device Architecture (CUDA). There are numerous different techniques in use for solid object coloring. One commonly used technique is the quad-tree, which has evolved from work in different fields. A quad-tree is a tree data structure in which each internal node has […]
Jan, 8

A Multi-GPU Programming Library for Real-Time Applications

We present MGPU, a C++ programming library targeted at single-node multi-GPU systems. Such systems combine disproportionate floating point performance with high data locality and are thus well suited to implement real-time algorithms. We describe the library design, programming interface and implementation details in light of this specific problem domain. The core concepts of this work […]
Jan, 8

Heat Load Modelling for District Heating Plants Using an OpenCL-based Algorithm

This research paper explores an OpenCL-based algorithm to aid heat load modelling for district heating plants. Previous studies have proven that heat loads mostly depend on the external temperatures (temperature dependency component) and the time of the day (time dependency component). In this research we have used the sum of two truncated exponential functions to […]
Jan, 8

Low cost approach to real-time vehicle to vehicle communication using parallel CPU and GPU processing

This paper proposes a novel Vehicle to Vehicle (V2V) communication system for collision avoidance which merges four different wireless devices (GPS, Wi-Fi, ZigBee and 3G) with a low power embedded Single Board Computer (SBC) in order to increase processing speed while maintaining a low cost. The three major technical challenges with such combinations are the […]
Jan, 8

Optimizations in Bioinformatics using GPU Processing on Binary Data

This experiment explores the performance of GPUs in genetic algorithms using binary data. The experiment executes a genetic algorithm which works with binary sequences that are processed on the GPU. The hypothesis is that an optimal number of maximum threads (likely larger than small) is required to have an optiomal runtime. The results show that […]
Jan, 8

Portable Mapping of Data Parallel Programs to OpenCL for Heterogeneous Systems

General purpose GPU based systems are highly attractive as they give potentially massive performance at little cost. Realizing such potential is challenging due to the complexity of programming. This paper presents a compiler based approach to automatically generate optimized OpenCL code from data-parallel OpenMP programs for GPUs. Such an approach brings together the benefits of […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: