Tuesday, March 6, 2007

An example of using K-means to do grading..

Folks

In the class today, I mentioned about how K-means is a good fit for
converting numerical scores into grades (since the number of clusters
is known).

Some years back (back when there were still A/B/C/D/E grades), I sent
the following mail to the 494 class to illustrate the use of K-means
to associate letter grades to their actual mid-term marks. Although
the impact is not as great on you since it is not _your_ marks that
are being clustered, you can still, I am sure, feel sympathetic pangs
of understanding.. (this also works well as a nice way to understand
all of K-means properties)

(You may also note, from the example below, that the distribution of
marks on my exams tend to have very high variance...)

yours in questionable taste
Rao


===============================

The midterm has been graded. The stats are: avg: 29 standard
deviation:14.5
33 people took the exam which was for 63 points (not 65, as i seem to
have miscounted)
There are 1 above 60
one between 50-60
4 between 40-50
8 between 40-30
8 between 30-20
7 between 20-10
4 below 10


I will return the exams at the end of the class on Monday. don't ask
me for marks before that.


Now I am sure you all want to know what would be "A", "B" etc cut offs
just for this exam.


I thought we can use the k-means algorithm to do clustering. After
all, the usual problem with k-means, that it requires us to
pre-specify number of clusters is not really a problem here since we
are
going to have a fixed number of letter grades anyway.


So, I wrote up a little (lisp) program for k-means and ran it on the
list of 33 individual scores. Here, for your edification are the
results:

Case 1: 5 clusters (A,B,C,D,E)

case 1
USER(218): (k-means mlist 5 :key #'mark-val)


>>>>((61.5) (38) (32) (26) (17.5))
>>>>((61.5 55) (48 47.5 47.5 47.5 38 37 35) (34.5 32.5 32.5 32 30 29)
(28 27 27 26 25.5 22.5)
(20.5 19 18 17.5 17 13.5 13 11.5 9.5 8.5 7 4)) --Dissimilarity
Measure:113.07143
>>>>((61.5 55) (48 47.5 47.5 47.5 38) (37 35 34.5 32.5 32.5 32 30 29)
(28 27 27 26 25.5 22.5 20.5)
(19 18 17.5 17 13.5 13 11.5 9.5 8.5 7 4)) --Dissimilarity
Measure:97.791214
>>>>((61.5 55) (48 47.5 47.5 47.5) (38 37 35 34.5 32.5 32.5 32 30) (29
28 27 27 26 25.5 22.5 20.5 19)
(18 17.5 17 13.5 13 11.5 9.5 8.5 7 4)) --Dissimilarity
Measure:88.91668
>>>>((61.5 55) (48 47.5 47.5 47.5) (38 37 35 34.5 32.5 32.5 32 30) (29
28 27 27 26 25.5 22.5 20.5 19)
(18 17.5 17 13.5 13 11.5 9.5 8.5 7 4)) --Dissimilarity
Measure:88.91668


;;Notice that the k-means starts with some default cluster centers and
;;iterates until clusters stablize. So the last ">>>" above is the
;;cluster we get.
;;(if we stopped with the first iteration people with 34.5 would have
;;gotten A ;-)


That dissimilairty measure is reduced from iteration to iteration
in each run


[[For the rest of the examples--except Case 1', I don't show cluster
dissimilarity measure]]


Case 2: 4 clusters (A,B,C,D) (assume I decided not to give Es)
USER(162): (k-means mlist 4 :key #'mark-val)


>>>>((61.5) (35) (27) (17.5))
>>>>((61.5 55) (48 47.5 47.5 47.5 38 37 35 34.5 32.5 32.5 32) (30 29
28 27 27 26 25.5 22.5)
(20.5 19 18 17.5 17 13.5 13 11.5 9.5 8.5 7 4))
>>>>((61.5 55) (48 47.5 47.5 47.5 38 37 35 34.5) (32.5 32.5 32 30 29
28 27 27 26 25.5 22.5 20.5)
(19 18 17.5 17 13.5 13 11.5 9.5 8.5 7 4))
>>>>((61.5 55) (48 47.5 47.5 47.5 38 37 35) (34.5 32.5 32.5 32 30 29
28 27 27 26 25.5 22.5 20.5)
(19 18 17.5 17 13.5 13 11.5 9.5 8.5 7 4))
>>>>((61.5 55) (48 47.5 47.5 47.5 38 37) (35 34.5 32.5 32.5 32 30 29
28 27 27 26 25.5 22.5 20.5)
(19 18 17.5 17 13.5 13 11.5 9.5 8.5 7 4))
>>>>((61.5 55) (48 47.5 47.5 47.5 38 37) (35 34.5 32.5 32.5 32 30 29
28 27 27 26 25.5 22.5)
(20.5 19 18 17.5 17 13.5 13 11.5 9.5 8.5 7 4))
>>>>((61.5 55) (48 47.5 47.5 47.5 38 37) (35 34.5 32.5 32.5 32 30 29
28 27 27 26 25.5 22.5)
(20.5 19 18 17.5 17 13.5 13 11.5 9.5 8.5 7 4))


Case 3: 4 clusters (A,B,C,D) but we remove the highest (61.5) from
consideration (assume that we give that person an A+ or just
physically push
that person out of the class for the sake of collective happiness).


SER(208): (k-means (cdr mlist) 4 :key #'mark-val)

>>>>((55) (34.5) (27) (17))
>>>>((55 48 47.5 47.5 47.5) (38 37 35 34.5 32.5 32.5 32) (30 29 28 27
27 26 25.5 22.5)
(20.5 19 18 17.5 17 13.5 13 11.5 9.5 8.5 7 4))
>>>>((55 48 47.5 47.5 47.5) (38 37 35 34.5 32.5 32.5 32) (30 29 28 27
27 26 25.5 22.5 20.5)
(19 18 17.5 17 13.5 13 11.5 9.5 8.5 7 4))
>>>>((55 48 47.5 47.5 47.5) (38 37 35 34.5 32.5 32.5 32) (30 29 28 27
27 26 25.5 22.5 20.5)
(19 18 17.5 17 13.5 13 11.5 9.5 8.5 7 4))

As expected, with 61.5 tossed out of the class, a lot more people get
As ;-)

Case 1': Clustering with a different set of initial centers
we will repeat case 3, except we start with different centers


case 1'
USER(219): (k-means mlist-r 5 :key #'mark-val)


>>>>((35) (32) (26) (18) (17))
>>>>((61.5 55 48 47.5 47.5 47.5 38 37 35 34.5) (32.5 32.5 32 30 29)
(28 27 27 26 25.5 22.5) (20.5 19 18 17.5)
(17 13.5 13 11.5 9.5 8.5 7 4)) --Dissimilarity Measure:117.0
>>>>((61.5 55 48 47.5 47.5 47.5) (38 37 35 34.5 32.5 32.5 32 30 29)
(28 27 27 26 25.5 22.5) (20.5 19 18 17.5 17)
(13.5 13 11.5 9.5 8.5 7 4)) --Dissimilarity Measure:82.19365
>>>>((61.5 55 48 47.5 47.5 47.5) (38 37 35 34.5 32.5 32.5 32 30) (29
28 27 27 26 25.5 22.5) (20.5 19 18 17.5 17)
(13.5 13 11.5 9.5 8.5 7 4)) --Dissimilarity Measure:80.37619
>>>>((61.5 55 48 47.5 47.5 47.5) (38 37 35 34.5 32.5 32.5 32) (30 29
28 27 27 26 25.5 22.5) (20.5 19 18 17.5 17)
(13.5 13 11.5 9.5 8.5 7 4)) --Dissimilarity Measure:78.55476
>>>>((61.5 55 48 47.5 47.5 47.5) (38 37 35 34.5 32.5 32.5 32) (30 29
28 27 27 26 25.5) (22.5 20.5 19 18 17.5 17)
(13.5 13 11.5 9.5 8.5 7 4)) --Dissimilarity Measure:78.571434
>>>>((61.5 55 48 47.5 47.5 47.5) (38 37 35 34.5 32.5 32.5 32) (30 29
28 27 27 26 25.5) (22.5 20.5 19 18 17.5 17)
(13.5 13 11.5 9.5 8.5 7 4)) --Dissimilarity Measure:78.571434

Compare this to case 1--see that we now converged to an entirely
different cluster just because we started from new centers


Note
* that the lowest dissimilarity attained depends on the original
cluster centers. This is a consequence of the fact that K-means is
a greedy algorithm and is not finding clusters with globally
lowest cluster dissimilarities.


* It is nice to see that the clusters found in case 1' are better
(according to the dissimilarity metric) than those found in case 1
(because this means that giving more As is in fact a better
idea according to k-means ;-)


case 2': repeat case 2 with different centers

USER(209): (k-means mlist-r 4 :key #'mark-val)

>>>>((35) (22.5) (18) (9.5))
>>>>((61.5 55 48 47.5 47.5 47.5 38 37 35 34.5 32.5 32.5 32 30 29) (28
27 27 26 25.5 22.5 20.5) (19 18 17.5 17)
(13.5 13 11.5 9.5 8.5 7 4))
>>>>((61.5 55 48 47.5 47.5 47.5 38 37 35 34.5) (32.5 32.5 32 30 29 28
27 27 26 25.5 22.5) (20.5 19 18 17.5 17)
(13.5 13 11.5 9.5 8.5 7 4))
>>>>((61.5 55 48 47.5 47.5 47.5 38 37) (35 34.5 32.5 32.5 32 30 29 28
27 27 26 25.5) (22.5 20.5 19 18 17.5 17)
(13.5 13 11.5 9.5 8.5 7 4))
>>>>((61.5 55 48 47.5 47.5 47.5) (38 37 35 34.5 32.5 32.5 32 30 29 28
27 27 26 25.5) (22.5 20.5 19 18 17.5 17)
(13.5 13 11.5 9.5 8.5 7 4))
>>>>((61.5 55 48 47.5 47.5 47.5) (38 37 35 34.5 32.5 32.5 32 30 29 28
27 27 26 25.5) (22.5 20.5 19 18 17.5 17)
(13.5 13 11.5 9.5 8.5 7 4))

;;interesting how going from 5 to four just merges old B and C...


case 3': repeat case 3 with different centers (we just removed 61.5)

USER(211): (k-means mlist-r-2 4 :key #'mark-val)

>>>>((35) (30) (22.5) (9.5))
>>>>((55 48 47.5 47.5 47.5 38 37 35 34.5 32.5 32.5) (32 30 29 28 27
27) (26 25.5 22.5 20.5 19 18 17.5 17)
(13.5 13 11.5 9.5 8.5 7 4))
>>>>((55 48 47.5 47.5 47.5 38 37) (35 34.5 32.5 32.5 32 30 29 28 27 27
26 25.5) (22.5 20.5 19 18 17.5 17)
(13.5 13 11.5 9.5 8.5 7 4))
>>>>((55 48 47.5 47.5 47.5 38) (37 35 34.5 32.5 32.5 32 30 29 28 27 27
26 25.5) (22.5 20.5 19 18 17.5 17)
(13.5 13 11.5 9.5 8.5 7 4))
>>>>((55 48 47.5 47.5 47.5) (38 37 35 34.5 32.5 32.5 32 30 29 28 27 27
26 25.5) (22.5 20.5 19 18 17.5 17)
(13.5 13 11.5 9.5 8.5 7 4))
>>>>((55 48 47.5 47.5 47.5) (38 37 35 34.5 32.5 32.5 32 30 29 28 27 27
26 25.5) (22.5 20.5 19 18 17.5 17)
(13.5 13 11.5 9.5 8.5 7 4))

Notice the non-local and somewhat non-intuitive effect on the
clusters.. A cutoff still remains 47.5, but B now got chopped up..


You would have thought more people will get better grades if the
highest guy gets thrown out... and yet...


In summary, you wouldn't trust your grades to K-means algorithm based
clustering because:


1. its clusters are sensitive to initial centers

2. its clusters are sensitive to outliers

3. deletion of elements might have non local effects on its clusters


I know I know. You see this whole thing as a joke in bad taste....

Rao
=============

1 comment:

Kartik said...

This sounds like adding insult to injury if there were actually clustering questions on that midterm :-)