So, here is the standard FAQ on
at-home-version-of-the-in-class exam
Let me know if you have questions (if you are afraid of asking them,
you can use http://rakaposhi.eas.asu.edu/cgi-bin/mail?rao
0. What are the ground rules for doing this--
Only that (a) you have not talked to anyone about the exam and (b) you
have to submit it at the ***beginning of the class
on Tuesday 4/3****
1. Do I do just the parts I thought I didn't do well or the whole exam?
You have to do the whole exam (see below as to why)
2. Do I lose anything if I don't do it at home?
No (okay--you do lose the satisfaction of doing it twice;-). Your
grade in the in-class test will stand.
3. How is the effective midterm grade computed?
Eff = max( in-class; w*in-class+(1-w)*at-home )
4. What is the range of w?
0.5 < w <1
(typical values in the past ranged between .6 and .666)
5. But if everyone else does it at home and improves their grade, and
I decide to watch Seinfeld reruns, don't I lose out?
No. First of all, *nobody* ever loses out by watching Seinfeld reruns
(Channel 10; week nights 10:30 and again at 11:30; also Channel 14 on
Rao's TV).
The difference between your inclass score and the Eff score will be
considered as your _extra credit_ on the mid term (and thus those
points wont affect grade cutoffs).
6. How do you device these ludicrously complex schemes?
This is the only way of making Xcel do some real work.
------------------------
Rao
Thursday, March 29, 2007
If you are planning to do the exam at home again--here are instructions
Wednesday, March 28, 2007
Fwd: office hours before midterm
---------- Forwarded message ----------
From: Subbarao Kambhampati <subbarao2z2@gmail.com>
Date: Mar 28, 2007 9:23 AM
Subject: office hours before midterm
To: Nanan <nanan9177@gmail.com>
I will be available between 3-4pm today for any consultations. I will also
be available between 1-2pm tomorrow. If I am in my office at other times, you are
welcome to ask me questions.
rao
Dear Dr. Kambhampati,Will you hold office hours for midterm exam? Thank you.Plus, what is the meaning of "A moment of order m exists only if r>m+1"?
Mean is considered the first moment, and variance the second moment etc of a statistical distribution
You can define higher order moments in terms of cubes etc.
Best Wishes,Nan
Tuesday, March 27, 2007
Reminder: You will have to submit the review/questions of the paper on collaborative filtering in class today
Folks
Please come prepared with a short review and questions regarding the paper on collaborative filtering on google news. You will use this to bring up questiosn during class discussion. This sheet will also be turned in at the end of the class and will become part of homework 3.
Rao
--
Posted By Subbarao Kambhampati to CSE494/598 Spring 2007 Blog at 3/22/2007 11:21:00 PM
Midterm Syllabus + Solutions for Homework 2
Midterm will cover all the topics covered in homeworks 1 and 2. This includes all the lectures up to and including the lecture of March 1st.
Solutions for the homework 2 are posted and are available from the homework page
(here is the direct link http://rakaposhi.eas.asu.edu/cse494/hw2-s07-solns )
rao
Sunday, March 25, 2007
Re: specimen midterm for CSE494
Both these URLs will work now
http://rakaposhi.eas.asu.edu/s07-specimen-exam.pdf
http://rakaposhi.eas.asu.edu/cse494/s07-specimen-exam.pdf
The link doesn't work...thanksZheshen(Jessie)
2007/3/25, Subbarao Kambhampati <rao@asu.edu>:
Folks
The specimen midterm from a previous offering of the course is made available at
http://rakaposhi.eas.asu.edu/s07-specimen-exam.pdf
This should give you some idea about how the midterm might be structured.
regards
rao
--
---------------------------------------------------
Wang, Zheshen
Computer Science Department
Arizona State University
specimen midterm for CSE494
Folks
The specimen midterm from a previous offering of the course is made available at
http://rakaposhi.eas.asu.edu/s07-specimen-exam.pdf
This should give you some idea about how the midterm might be structured.
regards
rao
Friday, March 23, 2007
Midterm on next Thursday--in class
As I mentioned on Tuesday's class, we will have the mid-term next Thursday in class.
(Since 3/30 is the drop date, I think it makes sense to have the exam before the drop date).
regards
rao
Thursday, March 22, 2007
Blog Task for next class+Discouraging dim sum dining (in class attendance..)
Please come prepared with a short review and questions regarding the paper on collaborative filtering on google news. You will use this to bring up questiosn during class discussion. This sheed will also be turned in at the end of the class and will become part of homework 3.
Also, here is a link to the adversarial classification problem I mentioned (I also added it to the readings list)
http://www.cs.washington.edu/homes/pedrod/papers/kdd04.pdf
Finally, I would like to remind you that attendance is not optional in this class. If you are registered and/or requested permission to attend, you are expected to attend consistently (just as I am expected to show up consistently).
regards
Rao
Tuesday, March 20, 2007
Saturday, March 17, 2007
Reading for Tuesday's class-- Chapter 13 from the IR manuscript
http://nlp.stanford.edu/IR-book/pdf/12-nbayes.pdf
as the background material for Tuesday's class.
rao
Thursday, March 15, 2007
Article on video search
Nice article on the current trend of video search. Read it when you have time. http://www.msnbc.msn.com/id
Wednesday, March 14, 2007
**important** regarding project B
Friday, March 9, 2007
Clarification on K-medoid algorithm
consider one of the data points in the cluster as the medoid (so it is the most representative
datapoint). The way the most representative data point is chosen may vary, but the most reasonable
idea (that is resistant to outliers) is to pick the point that has the lowest cumulative distance to all other
points. Since this is a costly operation, sometimes it is done only on a sample of the points in the cluster.
Here is the algorithm:
Basic K-medoid Algorithm:
1. Select K points as the initial medoids.
2. Assign all points to the closest medoid.
3. See if any other point is a "better" medoid (i.e, has the lowest average distance to all other points)
Finding a better medoid involves comparing all pairs of medoid and non-medoid points and is relatively inefficient
.–Sampling may be used.
4. Repeat steps 2 and 3 until the medoids don't change.
Rao
ps: Here is a paper that does use this type of clustering:
http://coblitz.codeen.org:3125/citeseer.ist.psu.edu/cache/papers/cs/27046/http:zSzzSzwww-faculty.cs.uiuc.eduzSz~hanjzSzpdfzSzvldb94.pdf/ng94efficient.pdf
Thursday, March 8, 2007
Median of 2 Dimensional Vectors
However, it turns out that even for 1-dimensional data, whenever we have an even number of elements in the set, the median always turns out to be the mean of the 2 middle elements (when the set is arranged in non-increasing or non-decreasing order).
Therefore the idea put forward in class would seem reasonable, that even if the "median vector" itself is not part of the set, it can be thought of as the median.
Homework 3 socket opened..
You may want to print it on laminated paper so it won't get wet as you are working on
the problems while relaxing on the beach...
rao
Tuesday, March 6, 2007
On the importance of skepticism in reading papers....
For those of you who have uploaded your comments on the anatomy paper, thanks. The rest of you, you should!
One thing I should tell you all is that you need to be more critical and skeptical in reading papers. Too may of the
reviews seem to be "hagiographies" and those are not particularly useful...
cheers
rao
An example of using K-means to do grading..
In the class today, I mentioned about how K-means is a good fit for
converting numerical scores into grades (since the number of clusters
is known).
Some years back (back when there were still A/B/C/D/E grades), I sent
the following mail to the 494 class to illustrate the use of K-means
to associate letter grades to their actual mid-term marks. Although
the impact is not as great on you since it is not _your_ marks that
are being clustered, you can still, I am sure, feel sympathetic pangs
of understanding.. (this also works well as a nice way to understand
all of K-means properties)
(You may also note, from the example below, that the distribution of
marks on my exams tend to have very high variance...)
yours in questionable taste
Rao
===============================
The midterm has been graded. The stats are: avg: 29 standard
deviation:14.5
33 people took the exam which was for 63 points (not 65, as i seem to
have miscounted)
There are 1 above 60
one between 50-60
4 between 40-50
8 between 40-30
8 between 30-20
7 between 20-10
4 below 10
I will return the exams at the end of the class on Monday. don't ask
me for marks before that.
Now I am sure you all want to know what would be "A", "B" etc cut offs
just for this exam.
I thought we can use the k-means algorithm to do clustering. After
all, the usual problem with k-means, that it requires us to
pre-specify number of clusters is not really a problem here since we
are
going to have a fixed number of letter grades anyway.
So, I wrote up a little (lisp) program for k-means and ran it on the
list of 33 individual scores. Here, for your edification are the
results:
Case 1: 5 clusters (A,B,C,D,E)
case 1
USER(218): (k-means mlist 5 :key #'mark-val)
>>>>((61.5) (38) (32) (26) (17.5))
>>>>((61.5 55) (48 47.5 47.5 47.5 38 37 35) (34.5 32.5 32.5 32 30 29)
(28 27 27 26 25.5 22.5)
(20.5 19 18 17.5 17 13.5 13 11.5 9.5 8.5 7 4)) --Dissimilarity
Measure:113.07143
>>>>((61.5 55) (48 47.5 47.5 47.5 38) (37 35 34.5 32.5 32.5 32 30 29)
(28 27 27 26 25.5 22.5 20.5)
(19 18 17.5 17 13.5 13 11.5 9.5 8.5 7 4)) --Dissimilarity
Measure:97.791214
>>>>((61.5 55) (48 47.5 47.5 47.5) (38 37 35 34.5 32.5 32.5 32 30) (29
28 27 27 26 25.5 22.5 20.5 19)
(18 17.5 17 13.5 13 11.5 9.5 8.5 7 4)) --Dissimilarity
Measure:88.91668
>>>>((61.5 55) (48 47.5 47.5 47.5) (38 37 35 34.5 32.5 32.5 32 30) (29
28 27 27 26 25.5 22.5 20.5 19)
(18 17.5 17 13.5 13 11.5 9.5 8.5 7 4)) --Dissimilarity
Measure:88.91668
;;Notice that the k-means starts with some default cluster centers and
;;iterates until clusters stablize. So the last ">>>" above is the
;;cluster we get.
;;(if we stopped with the first iteration people with 34.5 would have
;;gotten A ;-)
That dissimilairty measure is reduced from iteration to iteration
in each run
[[For the rest of the examples--except Case 1', I don't show cluster
dissimilarity measure]]
Case 2: 4 clusters (A,B,C,D) (assume I decided not to give Es)
USER(162): (k-means mlist 4 :key #'mark-val)
>>>>((61.5) (35) (27) (17.5))
>>>>((61.5 55) (48 47.5 47.5 47.5 38 37 35 34.5 32.5 32.5 32) (30 29
28 27 27 26 25.5 22.5)
(20.5 19 18 17.5 17 13.5 13 11.5 9.5 8.5 7 4))
>>>>((61.5 55) (48 47.5 47.5 47.5 38 37 35 34.5) (32.5 32.5 32 30 29
28 27 27 26 25.5 22.5 20.5)
(19 18 17.5 17 13.5 13 11.5 9.5 8.5 7 4))
>>>>((61.5 55) (48 47.5 47.5 47.5 38 37 35) (34.5 32.5 32.5 32 30 29
28 27 27 26 25.5 22.5 20.5)
(19 18 17.5 17 13.5 13 11.5 9.5 8.5 7 4))
>>>>((61.5 55) (48 47.5 47.5 47.5 38 37) (35 34.5 32.5 32.5 32 30 29
28 27 27 26 25.5 22.5 20.5)
(19 18 17.5 17 13.5 13 11.5 9.5 8.5 7 4))
>>>>((61.5 55) (48 47.5 47.5 47.5 38 37) (35 34.5 32.5 32.5 32 30 29
28 27 27 26 25.5 22.5)
(20.5 19 18 17.5 17 13.5 13 11.5 9.5 8.5 7 4))
>>>>((61.5 55) (48 47.5 47.5 47.5 38 37) (35 34.5 32.5 32.5 32 30 29
28 27 27 26 25.5 22.5)
(20.5 19 18 17.5 17 13.5 13 11.5 9.5 8.5 7 4))
Case 3: 4 clusters (A,B,C,D) but we remove the highest (61.5) from
consideration (assume that we give that person an A+ or just
physically push
that person out of the class for the sake of collective happiness).
SER(208): (k-means (cdr mlist) 4 :key #'mark-val)
>>>>((55) (34.5) (27) (17))
>>>>((55 48 47.5 47.5 47.5) (38 37 35 34.5 32.5 32.5 32) (30 29 28 27
27 26 25.5 22.5)
(20.5 19 18 17.5 17 13.5 13 11.5 9.5 8.5 7 4))
>>>>((55 48 47.5 47.5 47.5) (38 37 35 34.5 32.5 32.5 32) (30 29 28 27
27 26 25.5 22.5 20.5)
(19 18 17.5 17 13.5 13 11.5 9.5 8.5 7 4))
>>>>((55 48 47.5 47.5 47.5) (38 37 35 34.5 32.5 32.5 32) (30 29 28 27
27 26 25.5 22.5 20.5)
(19 18 17.5 17 13.5 13 11.5 9.5 8.5 7 4))
As expected, with 61.5 tossed out of the class, a lot more people get
As ;-)
Case 1': Clustering with a different set of initial centers
we will repeat case 3, except we start with different centers
case 1'
USER(219): (k-means mlist-r 5 :key #'mark-val)
>>>>((35) (32) (26) (18) (17))
>>>>((61.5 55 48 47.5 47.5 47.5 38 37 35 34.5) (32.5 32.5 32 30 29)
(28 27 27 26 25.5 22.5) (20.5 19 18 17.5)
(17 13.5 13 11.5 9.5 8.5 7 4)) --Dissimilarity Measure:117.0
>>>>((61.5 55 48 47.5 47.5 47.5) (38 37 35 34.5 32.5 32.5 32 30 29)
(28 27 27 26 25.5 22.5) (20.5 19 18 17.5 17)
(13.5 13 11.5 9.5 8.5 7 4)) --Dissimilarity Measure:82.19365
>>>>((61.5 55 48 47.5 47.5 47.5) (38 37 35 34.5 32.5 32.5 32 30) (29
28 27 27 26 25.5 22.5) (20.5 19 18 17.5 17)
(13.5 13 11.5 9.5 8.5 7 4)) --Dissimilarity Measure:80.37619
>>>>((61.5 55 48 47.5 47.5 47.5) (38 37 35 34.5 32.5 32.5 32) (30 29
28 27 27 26 25.5 22.5) (20.5 19 18 17.5 17)
(13.5 13 11.5 9.5 8.5 7 4)) --Dissimilarity Measure:78.55476
>>>>((61.5 55 48 47.5 47.5 47.5) (38 37 35 34.5 32.5 32.5 32) (30 29
28 27 27 26 25.5) (22.5 20.5 19 18 17.5 17)
(13.5 13 11.5 9.5 8.5 7 4)) --Dissimilarity Measure:78.571434
>>>>((61.5 55 48 47.5 47.5 47.5) (38 37 35 34.5 32.5 32.5 32) (30 29
28 27 27 26 25.5) (22.5 20.5 19 18 17.5 17)
(13.5 13 11.5 9.5 8.5 7 4)) --Dissimilarity Measure:78.571434
Compare this to case 1--see that we now converged to an entirely
different cluster just because we started from new centers
Note
* that the lowest dissimilarity attained depends on the original
cluster centers. This is a consequence of the fact that K-means is
a greedy algorithm and is not finding clusters with globally
lowest cluster dissimilarities.
* It is nice to see that the clusters found in case 1' are better
(according to the dissimilarity metric) than those found in case 1
(because this means that giving more As is in fact a better
idea according to k-means ;-)
case 2': repeat case 2 with different centers
USER(209): (k-means mlist-r 4 :key #'mark-val)
>>>>((35) (22.5) (18) (9.5))
>>>>((61.5 55 48 47.5 47.5 47.5 38 37 35 34.5 32.5 32.5 32 30 29) (28
27 27 26 25.5 22.5 20.5) (19 18 17.5 17)
(13.5 13 11.5 9.5 8.5 7 4))
>>>>((61.5 55 48 47.5 47.5 47.5 38 37 35 34.5) (32.5 32.5 32 30 29 28
27 27 26 25.5 22.5) (20.5 19 18 17.5 17)
(13.5 13 11.5 9.5 8.5 7 4))
>>>>((61.5 55 48 47.5 47.5 47.5 38 37) (35 34.5 32.5 32.5 32 30 29 28
27 27 26 25.5) (22.5 20.5 19 18 17.5 17)
(13.5 13 11.5 9.5 8.5 7 4))
>>>>((61.5 55 48 47.5 47.5 47.5) (38 37 35 34.5 32.5 32.5 32 30 29 28
27 27 26 25.5) (22.5 20.5 19 18 17.5 17)
(13.5 13 11.5 9.5 8.5 7 4))
>>>>((61.5 55 48 47.5 47.5 47.5) (38 37 35 34.5 32.5 32.5 32 30 29 28
27 27 26 25.5) (22.5 20.5 19 18 17.5 17)
(13.5 13 11.5 9.5 8.5 7 4))
;;interesting how going from 5 to four just merges old B and C...
case 3': repeat case 3 with different centers (we just removed 61.5)
USER(211): (k-means mlist-r-2 4 :key #'mark-val)
>>>>((35) (30) (22.5) (9.5))
>>>>((55 48 47.5 47.5 47.5 38 37 35 34.5 32.5 32.5) (32 30 29 28 27
27) (26 25.5 22.5 20.5 19 18 17.5 17)
(13.5 13 11.5 9.5 8.5 7 4))
>>>>((55 48 47.5 47.5 47.5 38 37) (35 34.5 32.5 32.5 32 30 29 28 27 27
26 25.5) (22.5 20.5 19 18 17.5 17)
(13.5 13 11.5 9.5 8.5 7 4))
>>>>((55 48 47.5 47.5 47.5 38) (37 35 34.5 32.5 32.5 32 30 29 28 27 27
26 25.5) (22.5 20.5 19 18 17.5 17)
(13.5 13 11.5 9.5 8.5 7 4))
>>>>((55 48 47.5 47.5 47.5) (38 37 35 34.5 32.5 32.5 32 30 29 28 27 27
26 25.5) (22.5 20.5 19 18 17.5 17)
(13.5 13 11.5 9.5 8.5 7 4))
>>>>((55 48 47.5 47.5 47.5) (38 37 35 34.5 32.5 32.5 32 30 29 28 27 27
26 25.5) (22.5 20.5 19 18 17.5 17)
(13.5 13 11.5 9.5 8.5 7 4))
Notice the non-local and somewhat non-intuitive effect on the
clusters.. A cutoff still remains 47.5, but B now got chopped up..
You would have thought more people will get better grades if the
highest guy gets thrown out... and yet...
In summary, you wouldn't trust your grades to K-means algorithm based
clustering because:
1. its clusters are sensitive to initial centers
2. its clusters are sensitive to outliers
3. deletion of elements might have non local effects on its clusters
I know I know. You see this whole thing as a joke in bad taste....
Rao
=============
Saturday, March 3, 2007
Pattern for Reading Web Content
http://www.useit.com/alertbox/reading_pattern.html
Friday, March 2, 2007
Clustering readings..
been changed ti chapters 16 and 17 from the IR book. Please read 16
for Tuesday and 17 for Thursday.
Rao
Class survey results (summary as sent by Qihong Shao)
Here are the results of the survey (thanks to Qihong for converting
the hardcopy datasheets into this). The percentages are color-coded--purple
for cse494; yellow for 598 and some sort of red-magenta for the totals.
Other than the comment regarding slides (which is legitimate.. I often
wind up re-ordering and adding slides after class to reflect things that happened in the class),
I didn't see any big red-lights. If you do, feel free to alert me either directly or anonymously
regards
Rao
Sheet1
|
Comments:
- This discussion is very helpful, I like how the class consists of lectures and discussions. It allows the class to read papers and ask questions in the discussion
- Could you please keep the content/order of the slides more consistant? I found the slides are changing from time to time and I spent quite some time to figure out the differences between the latest version and the one I have printed out and read.