11074

Posts

Dec, 6

Parallel tree-ensemble algorithms for GPUs using CUDA

We present two new parallel implementations of the tree-ensemble algorithms Random Forest (RF) and Extremely randomized trees (ERT) for emerging many-core platforms, e.g., contemporary graphics cards suitable for general-purpose computing (GPGPU). Random Forest and Extremely randomized trees are ensemble learners for classification and regression. They operate by constructing a multitude of decision trees at training […]
Dec, 6

Speeding up the small progress measures algorithm for parity games using the GPU

Solving parity games is interesting because it is equivalent to model checking for mu-calculus. The small progress measures (SPM) algorithm by Jurdzinski is originally a sequential algorithm for solving parity games. The nature of this algorithm allows easy parallelization, and previous research has already adapted it to work on multi-core machines. Here, SPM is adapted […]
Dec, 6

PyFR: An Open Source Framework for Solving Advection-Diffusion Type Problems on Streaming Architectures using the Flux Reconstruction Approach

High-order numerical methods for unstructured grids combine the superior accuracy of high-order spectral or finite difference methods with the geometric flexibility of low-order finite volume or finite element schemes. The Flux Reconstruction (FR) approach unifies various high-order schemes for unstructured grids within a single framework. Additionally, the FR approach exhibits a significant degree of element […]
Dec, 6

Similarity Search in Metric Spaces on Parallel multi-core and multi-GPU Platforms

This thesis has proposed a set of algorithms and strategies to solve similarity searches in metric spaces using different parallel platforms. In the first part of the thesis, we have used a multi-core platform, where we found that particular strategies are more suitable depending on the traffic query, obtaining a high speed-up (up to 7.9x […]
Dec, 6

A Fast Implementation of Parallel Discrete-Event Simulation on GPGPU

Modern General Purpose Graphics Processing Units(GPGPUs) offer much more computational power than recent CPUs by providing a vast number of simple, data parallel, multithreaded cores. In this study, we focus on the use of a GPGPU to perform parallel discrete-event simulation. Our approach is to use a modified service time distribution function to allow more […]
Dec, 6

Using Shared Memory as a Cache in Cellular Automata Water Flow Simulations on GPUs

Graphics processors (GPU – Graphic Processor Units) recently have gained a lot of interest as an efficient platform for general-purpose computation. Cellular Automata approach which is inherently parallel gives the opportunity to implement high performance simulations. This paper presents how shared memory in GPU can be used to improve performance for Cellular Automata models. In […]
Dec, 6

Applications of Many-Core Technologies to On-line Event Reconstruction in High Energy Physics Experiments

Interest in many-core architectures applied to real time selections is growing in High Energy Physics (HEP) experiments. In this paper we describe performance measurements of many-core devices when applied to a typical HEP online task: the selection of events based on the trajectories of charged particles. We use as benchmark a scaled-up version of the […]
Dec, 6

Fast quantum Monte Carlo on a GPU

We present a scheme for the parallelization of quantum Monte Carlo on graphical processing units, focusing on bosonic systems and variational Monte Carlo. We use asynchronous execution schemes with shared memory persistence, and obtain an excellent acceleration. Comparing with single core execution, GPU-accelerated code runs over x100 faster. The CUDA code is provided along with […]
Dec, 4

Computing OpenSURF on OpenCL and General Purpose GPU

Speeded-Up Robust Feature (SURF) algorithm is widely used for image feature detecting and matching in computer vision area. Open Computing Language (OpenCL) is a framework for writing programs that execute across heterogeneous platforms consisting of CPUs, GPUs, and other processors. This paper introduces how to implement an open-sourced SURF program, namely OpenSURF, on general purpose […]
Dec, 4

GPU-based Multi-stream Analyzer on Application Layer for Service-oriented Router

Service-oriented router (SoR) is a new router architecture for providing rich services to Internet users by utilizing useful information extracted from network traffic. In SoR, stream reconstruction and selection is a fundamental process for providing the services in the application layer. After real-time reconstruction of stream data, SoR used a software character string analyzer to […]
Dec, 4

Exploiting Heterogeneous Systems: Keccak on OpenCL

Using graphics processing units (GPUs) in high-performance parallel computing continues to become more prevalent, often as part of a heterogeneous system. CUDA and OpenCL are APIs and enables programmers to developer GPGPU applications and softwares to massively parallel processors. In October 2, 2012, NIST announced the winner of its five-year competition to select a new […]
Dec, 4

Cluster-SkePU: A Multi-Backend Skeleton Programming Library for GPU Clusters

SkePU is a C++ template library with a simple and unified interface for expressing data parallel computations in terms of generic components, called skeletons, on multi-GPU systems using CUDA and OpenCL. The smart containers in SkePU, such as Matrix and Vector, perform data management with a lazy memory copying mechanism that reduces redundant data communication. […]

Recent source codes

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us:

contact@hpgu.org