Regarding the discussion on K-medoid algorithm, the standard idea seems to be to

consider one of the data points in the cluster as the medoid (so it is the most representative

datapoint). The way the most representative data point is chosen may vary, but the most reasonable

idea (that is resistant to outliers) is to pick the point that has the lowest cumulative distance to all other

points. Since this is a costly operation, sometimes it is done only on a sample of the points in the cluster.

Here is the algorithm:

Basic K-medoid Algorithm:

1. Select K points as the initial medoids.

2. Assign all points to the closest medoid.

3. See if any other point is a "better" medoid (i.e, has the lowest average distance to all other points)

Finding a better medoid involves comparing all pairs of medoid and non-medoid points and is relatively inefficient

.–Sampling may be used.

4. Repeat steps 2 and 3 until the medoids don't change.

Rao

ps: Here is a paper that does use this type of clustering:

http://coblitz.codeen.org:3125/citeseer.ist.psu.edu/cache/papers/cs/27046/http:zSzzSzwww-faculty.cs.uiuc.eduzSz~hanjzSzpdfzSzvldb94.pdf/ng94efficient.pdf

## Friday, March 9, 2007

Subscribe to:
Post Comments (Atom)

## 1 comment:

you have a nice site. thanks for sharing this site. you can download lots of ebooks from here

http://feboook.blogspot.com

Post a Comment