7670

Posts

May, 11

Data Regression with Normal Equation on GPU using CUDA

Demand in the consumer market for graphics hardware that accelerates rendering of 3D images has resulted in Graphic Cards that are capable of delivering astonishing levels of performance. These results were achieved by specifically tailoring the hardware for the target domain. As graphics accelerators become increasingly programmable however, this performance has made them an attractive […]
May, 11

Large scale parallel state space search utilizing graphics processing units and solid state disks

The evolution of science is a double-track process composed of theoretical insights on the one hand and practical inventions on the other one. While in most cases new theoretical insights motivate hardware developers to produce systems following the theory, in some cases the shown hardware solutions force theoretical research to forecast the results to expect. […]
May, 11

CAPRI: Prediction of Compaction-Adequacy for Handling Control-Divergence in GPGPU Architectures

Wide SIMD-based GPUs have evolved into a promising platform for running general purpose workloads. Current programmable GPUs allow even code with irregular control to execute well on their SIMD pipelines. To do this, each SIMD lane is considered to execute a logical thread where hardware ensures that control flow is accurate by automatically applying masked […]
May, 10

Enhancing data parallelism for Ant Colony Optimization on GPUs

Graphics Processing Units (GPUs) have evolved into highly parallel and fully programmable architectures over the past five years, and the advent of CUDA has facilitated their application to many real-world applications. In this paper, we deal with a GPU implementation of Ant Colony Optimisation (ACO), a population-based optimisation method which comprises two major stages: Tour […]
May, 10

A GPU-Accelerated Algorithm for Self-Organizing Maps in a Distributed Environment

In this paper we introduce a MapReduce-based implementation of self-organizing maps that performs compute-bound operations on distributed GPUs. The kernels are optimized to ensure coalesced memory access and effective use of shared memory. We have performed extensive tests of our algorithms on a cluster of eight nodes with two NVidia Tesla M2050 attached to each, […]
May, 10

An Efficient Common Substrings Algorithm for On-the-Fly Behavior-Based Malware Detection and Analysis

It is well known that malware (worms, botnets, etc…) thrive on communication systems. The process of detecting and analyzing malware is very latent and not well-suited for real-time application, which is critical especially for propagating malware. For this reason, recent methods identify similarities among malware dynamic trace logs to extract malicious behavior snippets. These snippets […]
May, 10

Constructing Natural Neighbor Interpolation Based Grid DEM Using CUDA

Constructing digitial elevation model(DEM) from dense LiDAR points becomes increasingly important. Natural Neighbor Interpolation (NNI) is a popular approach to DEM construction from point datasets but is computationally intensive. In this study, we present a set of General Purpose computing Graphics Processing Unit(GPGPU) based algorithms that can significant speed up the process. Evaluating three real […]
May, 10

GHOSTM: A GPU-Accelerated Homology Search Tool for Metagenomics

BACKGROUND: A large number of sensitive homology searches are required for mapping DNA sequence fragments to known protein sequences in public and private databases during metagenomic analysis. BLAST is currently used for this purpose, but its calculation speed is insufficient, especially for analyzing the large quantities of sequence data obtained from a next-generation sequencer. However, […]
May, 9

Exploration of Optimization Options for Increasing Performance of a GPU Implementation of a Three-Dimensional Bilateral Filter

This report explores using GPUs as a platform for performing high performance medical image data processing, specifically smoothing using a 3D bilateral filter, which performs anisotropic, edge-preserving smoothing. The algorithm consists of a running a specialized 3D convolution kernel over a source volume to produce an output volume. Overall, our objective is to understand what […]
May, 9

An Overview of Selected Hybrid and Reconfigurable Architectures

Node level heterogeneous architectures have become attractive in recent years for several reasons: Compared to traditional symmetric CPUs, they offer high performance for real applications, and can be energy and/or cost efficient. In this paper, we give an overview of the state-of-the-art in heterogeneous computing, focusing on some common architectures: The NVidia and the ATI […]
May, 9

Automatic Discovery of Algorithms for Multi-Agent Systems

Automatic algorithm generation for large-scale distributed systems is one of the holy grails of artificial intelligence and agent-based modeling. It has direct applicability in future engineered (embedded) systems, such as mesh networks of sensors and actuators where there is a high need to harness their capabilities via algorithms that have good scalability characteristics. NetLogo has […]
May, 9

Enabling task-level scheduling on heterogeneous platforms

OpenCL is an industry standard for parallel programming on heterogeneous devices. With OpenCL, compute-intensive portions of an application can be offloaded to a variety of processing units within a system. OpenCL is the first standard that focuses on portability, allowing programs to be written once and run seamlessly on multiple, heterogeneous devices, regardless of vendor. […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: