9302

Mr. Scan: Extreme Scale Density-Based Clustering using a Tree-Based Network of GPGPU Nodes

Benjamin Welton, Evan Samanas, Barton P. Miller
Computer Sciences Department, University of Wisconsin, Madison, WI 53706
University of Wisconsin, 2013
@article{welton2013mr,

   title={Mr. Scan: Extreme Scale Density-Based Clustering using a Tree-Based Network of GPGPU Nodes},

   author={Welton, Benjamin and Samanas, Evan and Miller, Barton P},

   year={2013}

}

Download Download (PDF)   View View   Source Source   

418

views

Density-based clustering algorithms are a widely-used class of data mining techniques that can find irregularly shaped clusters and cluster data without prior knowledge of the number of clusters it contains. DBSCAN is the most well-known density-based clustering algorithm. We introduce our version of DBSCAN, called Mr. Scan, which uses a hybrid parallel implementation that combines the MRNet tree-based distribution network with GPGPU-equipped nodes. This design allows Mr. Scan to efficiently and accurately cluster multi-billion point datasets. Mr. Scan avoids the problems of existing implementations by effectively partitioning the point space and by optimizing DBSCAN’s computation over dense data regions. We tested Mr. Scan on a geolocated Twitter dataset. At its largest scale, Mr. Scan clustered 6.5 billion points from the Twitter dataset on 8,192 GPU nodes on Cray Titan in 17.3 minutes. All other parallel DBSCAN implementations have only demonstrated the ability to cluster up to 100 million points.
VN:F [1.9.22_1171]
Rating: 5.0/5 (1 vote cast)
Mr. Scan: Extreme Scale Density-Based Clustering using a Tree-Based Network of GPGPU Nodes, 5.0 out of 5 based on 1 rating

* * *

* * *

Like us on Facebook

HGPU group

147 people like HGPU on Facebook

Follow us on Twitter

HGPU group

1229 peoples are following HGPU @twitter

Featured events

* * *

Free GPU computing nodes at hgpu.org

Registered users can now run their OpenCL application at hgpu.org. We provide 1 minute of computer time per each run on two nodes with two AMD and one nVidia graphics processing units, correspondingly. There are no restrictions on the number of starts.

The platforms are

Node 1
  • GPU device 0: AMD/ATI Radeon HD 5870 2GB, 850MHz
  • GPU device 1: AMD/ATI Radeon HD 6970 2GB, 880MHz
  • CPU: AMD Phenom II X6 @ 2.8GHz 1055T
  • RAM: 12GB
  • OS: OpenSUSE 13.1
  • SDK: AMD APP SDK 2.9
Node 2
  • GPU device 0: AMD/ATI Radeon HD 7970 3GB, 1000MHz
  • GPU device 1: nVidia GeForce GTX 560 Ti 2GB, 822MHz
  • CPU: Intel Core i7-2600 @ 3.4GHz
  • RAM: 16GB
  • OS: OpenSUSE 12.2
  • SDK: nVidia CUDA Toolkit 6.0.1, AMD APP SDK 2.9

Completed OpenCL project should be uploaded via User dashboard (see instructions and example there), compilation and execution terminal output logs will be provided to the user.

The information send to hgpu.org will be treated according to our Privacy Policy

HGPU group © 2010-2014 hgpu.org

All rights belong to the respective authors

Contact us: