Monday, February 12, 2007

An academic paper on whether google (information extraction) invades your privacy

I mentioned an anecdote last class about how Google's cache inadvertently outed
a colleague's as-yet-not-public plans to move from his home institution.

Here is a paper on whether information extraction of the sort exemplified by Adwords
can be seen to be invading your privacy.

http://rakaposhi.eas.asu.edu/cse494/google-privacy-ijcai.pdf


I mentioned this paper to a couple of you. Here it is for everyone. We will read this
formally at some point of time--but until then, you might enjoy the arguments
(the one I like best is the argument that as the sophistication of NLP techniques used in the
information extraction increases, and as search engines/portals extract and keep large volumes of
text tucked away in their cluster farms, they can be *held liable* for not acting on information they have
if for example, people use email to hatch plots and then carry them through.. This is one reason why
increasingly many institutions now have a standing policy of removing all email every year..)

Rao
ps: Sergei related an anecdote where Google started giving out addresses of various people because
it had access to a large library of resumes and did some elementary address extraction..

2 comments:

oneuponzero said...

the fact of email provider reading our mails seems interesting but
this caused a serius legal discussion seems very amusing for me.

Particularly when Public key cryptography is well settled protocol , people should not trust any service provider for their privacy. and sending plain text in email seems too naive idea.

( At least from academic view one can easily see that use of standard crypto protocol makes reading any message practically impossible for any machine how much years it has to do so. Commercial implications are beyond my scope though :-)

in this world of hackers and intelligent agents ,the technology should be used for our privacy too .

Subbarao Kambhampati said...

I don't think encryption completely solves the problem.

First off, Google basically generates adwords at the time you open and read the message. Since even encrypted messages are decrypted on the other end before being read, as long as you are reading them in gmail reader, you will get the same "privacy violation" issues.

Secondly, the way the law works is that you don't get to break into my home *even if I have left my doors open*. So, whether or not google is invading privacy is not dependent on whether or not it is decrypting a message or just reading an unencrypted message.

rao