high performance computing on graphics processing units: hgpu.org

Posts

Jul, 28

Ensemble K-Means on Modern Many Core Hardware

Clustering involves partitioning a set of objects into subsets called clusters so that objects in the same cluster are similar according to some metric. Clustering is widely used in many fields like machine learning, data mining, pattern recognition and bioinformatics. K-means algorithm is the most popular algorithm used for clustering which uses distance as the […]

OpenCL

Jul, 28

Acceleration of Variance of Color Differences-Based Demosaicing Using CUDA

Image demosaicing algorithms are used to reconstruct a full color image from the incomplete color samples output (RAW data) of an image sensor overlaid with a Color Filter Array (CFA). Better demosaicing algorithms are superior in terms of acuity, dynamic range, signal to noise ratio, and artifact suppression, which make them suitable for high quality […]

CUDA

Jul, 28

Speeding up LIP-Canny with CUDA programming

The LIP-Canny algorithm outperforms traditional Canny edge detection in terms of edge detection under varying illumination. This method is based on a robust mathematical model (LIP paradigm), which is closer to the human vision system. However, this model requires more computations and more complex operations than the traditional paradigm. Non-parallel implementations of LIP-Canny do not […]

CUDA

Jul, 28

Computational modeling of synthetic microbial biofilms

Microbial biofilms are complex, self-organized communities of bacteria, which employ physiological cooperation and spatial organization to increase both their metabolic efficiency and their resistance to changes in their local environment. These properties make biofilms an attractive target for engineering, particularly for the production of chemicals such as pharmaceutical ingredients or biofuels, with the potential to […]

OpenCL

Jul, 27

A virtual memory based runtime to support multi-tenancy in clusters with GPUs

Graphics Processing Units (GPUs) are increasingly becoming part of HPC clusters. Nevertheless, cloud computing services and resource management frameworks targeting heterogeneous clusters including GPUs are still in their infancy. Further, GPU software stacks (e.g., CUDA driver and runtime) currently provide very limited support to concurrency. In this paper, we propose a runtime system that provides […]

CUDA

Jul, 27

Speeding up Large-Scale Point-in-Polygon Test Based Spatial Join on GPUs

Point-in-Polygon (PIP) test is fundamental to spatial databases and GIS. Motivated by the slow response times in joining largescale point locations with polygons using traditional spatial databases and GIS and the massively data parallel computing power of commodity GPU devices, we have designed and developed an end-to-end system completely on GPUs to associate points with […]

CUDA

Jul, 27

High-Performance Spatial Join Processing on GPGPUs with Applications to Large-Scale Taxi Trip Data

Spatially joining GPS recorded locations with infrastructure data, such as points of interests, road network, land cover and different types of zones, and assigning a point with its nearest polyline or polygon is a prerequisite for trip related analysis, which is becoming increasingly important in ubiquitous computing. However, existing spatial databases and GIS are incapable […]

CUDA

Jul, 27

CUDA Kernel Design for GPU-Based Beam Dymanics Simulations

Efficient implementation of general purpose particle tracking on GPUs can result in significant performance benefits to large scale particle tracking and tracking-based accelerator optimization simulations. We present our work on accelerating Argonne National Lab’s accelerator simulation code ELEGANT [1, 2] using CUDA-enabled GPUs [3]. In particular, we provide an overview of beamline elements ported to […]

CUDA

Jul, 27

Using OpenGL State History for Graphics Debugging

Graphics programming sees widespread use across a number of industries, from video games to medical imaging. Because of the unique needs of graphics programming, specialised tools are required to aid in the debugging of graphics programs. While there are a number of well-maintained and wide-spread general-purpose debugging tools, the range of available graphics debuggers is […]

OpenGL

Jul, 26

SnuCL: an OpenCL framework for heterogeneous CPU/GPU clusters

In this paper, we propose SnuCL, an OpenCL framework for heterogeneous CPU/GPU clusters. We show that the original OpenCL semantics naturally fits to the heterogeneous cluster programming environment, and the framework achieves high performance and ease of programming. The target cluster architecture consists of a designated, single host node and many compute nodes. They are […]

OpenCL

Jul, 26

Efficient Implementation of the CPR Formulation for the Navier-Stokes Equations on GPUs

The correction procedure via reconstruction (CPR) formulation for the Euler and Navier-Stokes equations is implemented on a NVIDIA graphics processing unit (GPU) using CUDA C with both explicit and implicit time-stepping schemes for 2D unstructured triangular grids. For the implicit time integration, a first order time approximation with Newton iteration and Gauy elimination is used […]

CUDA

Jul, 26

Bidimensional Median Filter for Parallel Computing Architectures

The median filter is a non-linear filter used for removal of salt and pepper noise from images. Each pixel of the image is replaced by the median of its surrounding elements, the median value is calculated by sorting the data. The complexity of the sorting algorithms used on the median filters are O(n^2) or O(n), […]

CUDA