
Jan, 14

A Case for Work-stealing on FPGAs with OpenCL Atomics

We provide a case study of work-stealing, a popular method for run-time load balancing, on FPGAs. Following the Cederman-Tsigas implementation for GPUs, we synchronize workitems not with locks, mutexes or critical sections, but instead with the atomic operations provided by Altera’s OpenCL SDK. We evaluate work-stealing for FPGAs by synthesizing a K-means clustering algorithm on […]
Jan, 14

Classification of Higgs Boson Tau-Tau decays using GPU accelerated Neural Networks

In particle physics, Higgs Boson to tau-tau decay signals are notoriously difficult to identify due to the presence of severe background noise generated by other decaying particles. Our approach uses neural networks to classify events as signals or background noise.
Jan, 14

A Survey Of Techniques for Approximate Computing

Approximate computing trades off computation quality with the effort expended and as rising performance demands confront with plateauing resource budgets, approximate computing has become, not merely attractive, but even imperative. In this paper, we present a survey of techniques for approximate computing (AC). We discuss strategies for finding approximable program portions and monitoring output quality, […]
Jan, 12

GPU Remote Memory Access Programming

High performance computing studies the construction and programming of computing system with tremendous computational power playing a key role in scientific computing and research across disciplines. The graphics processing unit (GPU) developed for fast 2D and 3D visualizations has turned into a programmable general purpose accelerator device boosting today’s high performance clusters. Leveraging these computational […]
Jan, 12

A Workload Balanced MapReduce Framework on GPU Platforms

The MapReduce framework is a programming model proposed by Google to process large datasets. It is an efficient framework that can be used in many areas, such as social network, scientific research, electronic business, etc. Hence, more and more MapReduce frameworks are implemented on different platforms, including Phoenix (based on multicore CPU), MapCG (based on […]
Jan, 12

Real-Time Dedispersion for Fast Radio Transient Surveys, using Auto Tuning on Many-Core Accelerators

Dedispersion, the removal of deleterious smearing of impulsive signals by the interstellar matter, is one of the most intensive processing steps in any radio survey for pulsars and fast transients. We here present a study of the parallelization of this algorithm on many-core accelerators, including GPUs from AMD and NVIDIA, and the Intel Xeon Phi. […]
Jan, 12

Study of low density nuclear matter with quantum molecular dynamics: the role of the symmetry energy

We study the effect of isospin-dependent nuclear forces on the pasta phase in the inner crust of neutron stars. To this end we model the crust within the framework of quantum molecular dynamics (QMD). For maximizing the numerical performance, the newly developed code has been implemented on GPU processors. As a first application of the […]
Jan, 7

GPU-Based Fuzzy C-Means Clustering Algorithm for Image Segmentation

In this paper, a fast and practical GPU-based implementation of Fuzzy C-Means (FCM) clustering algorithm for image segmentation is proposed. First, an extensive analysis is conducted to study the dependency among the image pixels in the algorithm for parallelization. The proposed GPU-based FCM has been tested on digital brain simulated dataset to segment white matter(WM), […]
Jan, 7

Computationally Efficient Tsunami Modelling on Graphics Processing Units (GPU)

Tsunamis generated by earthquakes commonly propagate as long waves in the deep ocean and develop into sharp-fronted surges moving rapidly towards the coast in shallow water, which may be effectively simulated by hydrodynamic models solving the nonlinear shallow water equations (SWEs). However, most of the existing tsunami models suffer from long simulation time for large-scale […]
Jan, 7

Verifying CUDA Programs using SMT-Based Context-Bounded Model Checking

We present ESBMC-GPU, an extension to the ESBMC model checker that is aimed at verifying GPU programs written for the CUDA framework. ESBMC-GPU uses an operational model for the verification, i.e., an abstract representation of the standard CUDA libraries that conservatively approximates their semantics. ESBMC-GPU verifies CUDA programs, by explicitly exploring the possible interleavings (up […]
Jan, 7

DeepLearningKit – an Open Source Deep Learning Framework for Apple’s iOS, OS X and tvOS developed in Metal and Swift

In this paper we present DeepLearningKit – an open source framework that supports using pre- trained deep learning models (convolutional neural networks) for iOS, OS X and tvOS. DeepLearningKit is developed in Metal in order to utilize the GPU efficiently and Swift for integration with applications, e.g. iOS-based mobile apps on iPhone/iPad, tvOS-based apps for […]
Jan, 7

Faster GPU Based Genetic Programming Using A Two Dimensional Stack

Genetic Programming (GP) is a computationally intensive technique which also has a high degree of natural parallelism. Parallel computing architectures have become commonplace especially with regards Graphics Processing Units (GPU). Hence, versions of GP have been implemented that utilise these highly parallel computing platforms enabling significant gains in the computational speed of GP to be […]

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us: