high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » Progressive Clustering of Big Data with GPU Acceleration and Visualization

Progressive Clustering of Big Data with GPU Acceleration and Visualization

Jun Wang, Eric Papenhausen, Bing Wang, Sungsoo Ha, Alla Zelenyuk, Klaus Mueller

Visual Analytics and Imaging Lab, Computer Science Department, Stony Brook University, Stony Brook, NY, USA

New York Scientific Data Summit (NYSDS), 2017

@article{wang2017progressive,

title={Progressive Clustering of Big Data with GPU Acceleration and Visualization},

author={Wang, Jun and Papenhausen, Eric and Wang, Bing and Ha, Sungsoo and Zelenyuk, Alla and Mueller, Klaus},

year={2017}

}

Download (PDF)

View

Source

3714

views

Clustering has become an unavoidable step in big data analysis. It may be used to arrange data into a compact format, making operations on big data manageable. However, clustering of big data requires not only the capability of handling data with large volume and high dimensionality, but also the ability to process streaming data, all of which are less developed in most current algorithms. Furthermore, big data processing is seldom interactive, which stands at conflict with users who seek answers immediately. The best one can do is to process incrementally, such that partial and, hopefully, accurate results can be available relatively quickly and are then progressively refined over time. We propose a clustering framework which uses Multi-Dimensional Scaling for layout and GPU acceleration to accomplish these goals. Our domain application is the clustering of mass spectral data of individual aerosol particles with 8 million data points of 450 dimensions each.

Tags: Algorithms, big data, Clustering, Computer science, CUDA, nVidia, Tesla K20, Visualization

September 9, 2017 by hgpu

Rating: 5.0/5. From 1 vote.

Please wait...

Your response

You must be logged in to post a comment.

high performance computing on graphics processing units: hgpu.org

Progressive Clustering of Big Data with GPU Acceleration and Visualization

Your response

Recent source codes

Interleaved Learning and Exploration: A Self-Adaptive Fuzz Testing Framework for MLIR

Pinocchio: PINpointing Orbit Crossing Collapsed Hierarchical Objects

KernelCoder: trained on a curated dataset of reasoning traces and CUDA kernel pairs

VibeCodeHPC - Multi Agentic Vibe Coding for HPC

Compile-Time Resource Safety for GPU APIs: A Low-Overhead Typestate Framework

exa-AMD: Exascale Accelerated Materials Discovery

TRUST: a thermalhydraulic software package for CFD simulations

Modular: The Modular Platform (includes MAX & Mojo)

Allo: Accelerator Design Language

Towards Robust Agentic CUDA Kernel Benchmarking, Verification, and Optimization

Most viewed papers (last 30 days)

Progressive Clustering of Big Data with GPU Acceleration and Visualization

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)