Tuesday, February 6, 2007

[discussion/aggregation topic]: Please add your answers to Qn 0 on the homework as comments to this blog entry


Homework 1; Qn 0:

Think of and list 3 queries (or activities) that you would like to do on the
Web that the current day search engines (e.g. Google) don't quite
support.

A quote to get you inspired:


"Some people see things as they are and say why? I dream things that
never were and say why not?"
-(Mis)attributed to Robert Kennedy
who paraphrased from Bernard Shaw

19 comments:

Sergei said...

Here're my big ideas for homework question 0:

1. "Big" Idea

When I search for something, in many cases I am searching for a specific type(s) of information. For example, if I search for "computer," I may be searching for computer stores, information about computer-related education, or history of computers, or how-tos, etc... Or, if I search for "king kong," although it is, most likely, movie information, I might be searching for trailers, photos or any other movie-related information; but I might be also searching for reviews (by critics or users or both), for fan pages, etc... The search results, however, are presented to me in the form of one list. I want many lists. I want to be presented with an hierarchy of categories and subcategories with sample top results and category descriptions - i.e. an hierarchy I can browse. For example, a search for "Tolkien" would return something like this (category descriptions + top results included):

1. Tolkien-related organizations
- fan web sites
- publishers
- movie studios/official movie sites
2. Fan sites
3. Books
4. Movies
5. Shopping
6. Education
- Tolkien-related programs and degrees
etc...

2. "Bigger" Idea

Often, I am not looking for web sites, but for specific information I can obtain from these web sites. A classic example would be searching for a product to buy. However, another, less obvious example, would be a faculty web page: as a prospective student I visited lots of such pages, looking for, essentially, the same types of information: research interests, projects, papers, current and former students, classes taught, personal stuff, etc... If I know that I am looking for a specific type of information, I'd like my search engine to extract this relevant information from web sites and present it to me in a format I specify. It may be a simple listing of content; for example, a search for "cs faculty' with an indication of the type of pages I am looking for (i.e. faculty pages or web sites), I'd like to see an initial list (maybe categorized by some criteria). However, once I click one of the results, instead of taking me to the specific web page, it presents me with the information according to a simple default layout, or a layout I specified by using a simply drag-and-drop utility (or by rearranging a sample page). There are many engines which do that for e-commerce, but I want to be able to structure my own results for anything at all - not just books or t-shirts.

An obvious consequence of such a service would be the ability to save the retrieved results. So, in minutes, I could build my own web sites containing information relevant to me alone. If I felt adventurous, I could even add my favorite design to it - so that it even looked as a web site. These web sites would be saved as queries, so they could be instantly rebuilt upon request, and, therefore, would be accessible from any computer.

Another obvious extension would be to automatically monitor such web sites, or collections, for any significant changes - like an RSS newsfeed.

3. "The Biggest" Idea

I would like my search engine to recognize my needs when I type in one word without compromising my privacy. I.e., I want Google to know what I need because Google knows more about me than my query terms; but I do not want Google to know about the real me. I'd like to be able to construct a profile for myself. I'd be even willing to go through a long set of questions - like identifying my background, my favorite books/movies/whatever, my education, etc... Based on all these details, I want Google to determine the most relevant results for me. However, to protect my privacy, as well as out of sheer curiosity, I'd like to be able to have as many profiles as I like, and I'd like to be able to apply either one of them to my search query (like a "choose your profile" drop-down box). I'd like to be able to compare the results returned for different profiles, including the default profile. And, of course, I'd like to see what people with similar profiles are preferring to look at.

Vishal said...

1. Current day search engines do not search for incomplete URL's.
Eg:There are some results for the search http://www.asu.edu/gpsa.
But incomplete URL:http://www.asu.edu/g or
http://www.asu.edu/gp or http://www.asu.edu/gps does not get any results.

2.Search results for words like couldn't and could not are different. couldn't is a contraction for could not hence the number of search results should not have varied. Similar other words are shouldn't and should not, wouldn't and would not and many others.

3.Whenever a user likes a word or pdf file it is generally saved or even when a user likes some page he may copy it to a notepad or something. How long the user looks at the page and what action he takes will be a good option to know if a user likes a page. If the user presses the back button in a very short period of time (that has to be some value which will have some defined value)it indicates that he did not like the page and on the other hand if the user does some action like copying or saving the page it indicates that the page was of importance to the user. This will help in finding out how relevant a document is which will enhance future results.

Zheshen(Jessie) said...

My queries:

1. Give me text materials in which the author made negative comments on Bill Clinton.

2. “Gone with the wind”, “Legend of the fall”, “The matrix” are three of my favorite movies; then give me 10 of the most possible movies I may like.

3. Give me all web pages about Arizona State University written in French, Hindi, Spanish.(The query is purely in English.)

Raju said...

1. Semantic Search on images/speech etc, "my latest picture taken on in a sea shore", Assume pictures are not tagged.

2."cheap used car" should return pages with this phrase or web forms to search used cars, why cant it search in this web databse, sort it by price and return me cheap car list?

3. "Links in the blog page http://abc.." witll return the pages with this exact phrase in high rank, rather than going to "http://abc" returning "http://abc" or links in the page, remember links the crawlers can identify easily.

Yang Qin said...

1) Image search: Sometime, I really want to search some other images and pictures which are similar with a certain one. Most of the current image search is based on tag or something like that, and unfortunately, there is not even one search engine which supports image search based on the actual content of the pictures. E.g., if I intend to find such pictures in which there is a man standing beside the sea, how can I do?????
2) Music search: I like music very much. Usually, I happen to remember the theme of a certain song, and I’m eager to listen to the whole song. However, the search engines cannot help me at all.
3) Human search: Suppose I have a picture of somebody, I want to know who he/she is and what’s his/her name, etc. Now, I don’t know how to do this job…

VJ said...

Query 1: Want to find out the list of profitable business to start off, if I had a plot of land at Mill Avenue, within a budget of $500,000.

Query 2: Find out the list of Full professors taking a course in Information Retrieval and have research interests in the area of Information Retrieval and Semi-structured querying.

Query 3: List of all people who have completed a Masters Degree in Computer Science and have founded a software company within 5 years of completion of their Masters.

Nanan said...

1.Communication with the search engine using natural language: You can ask queries not with several words, but can ask natural sentence such as “who is the most popular NBA star”? If my question is not exact enough, the search engine will try to communicate with me to get the more exact query.
2.Consider documents as articles with content instead of a bag of words, which means the search engine can understand the meaning of it.
3.Instead of using similarity function, compare the similarities with the meaning of the query and that of the document and get the exact result. When it is necessary, consider user’s tendency into account.

oneuponzero said...

3 Areas can add much more power to web search

1. NLP.

Consider finding out similar proverbs
Ex: 'There is no gain without pain'  find similar documents.
This will fetch me quotes or sentences meaning the same.
Though finding proverb does not seem interesting task, this property can be used for many meaningful searches.

2. Multimedia Processing

Searching a tune in song database.
I may not know words of song but if I know the tune , then search can suggest similar songs . ( This may require me to input audio or music notes.) – query by example type.

Like songs more advanced image/video search queries will be expected from search.

3. Integrating the web databases

Different sites provide top songs in music albums on their web pages.
But if some one wants total sales/TRP/hits for a given album from all possible sources
Then such queries will need some mediator which integrate tuples from various sources.

vvshah said...

1. Current day search engines do not search for incomplete URL's.
Eg:There are some results for the search http://www.asu.edu/gpsa.
But incomplete URL:http://www.asu.edu/g or
http://www.asu.edu/gp or http://www.asu.edu/gps does not get any results.

2.Search results for words like couldn't and could not are different. couldn't is a contraction for could not hence the number of search results should not have varied. Similar other words are shouldn't and should not, wouldn't and would not and many others.

3.Whenever a user likes a word or pdf file it is generally saved or even when a user likes some page he may copy it to a notepad or something. How long the user looks at the page and what action he takes will be a good option to know if a user likes a page. If the user presses the back button in a very short period of time (that has to be some value which will have some defined value)it indicates that he did not like the page and on the other hand if the user does some action like copying or saving the page it indicates that the page was of importance to the user. This will help in finding out how relevant a document is which will enhance future results.

feelingbird said...
This comment has been removed by the author.
feelingbird said...

1) When I gave the same query long before, was I satisfied with the results?
2) Based on the history of my searching, give the best ranking of results for my query. (According to TA, this query can be handled by Google. A surprise ! Thus, I have one more: which searching engine is the best choice for my next query? )
3) Can I use the searching engine efficiently? For example, is there any improvement on me to express my query more accurately?

Sanjay said...

The three queries are
1) Currently only text tags are used to retrieve images form the web through search engines. IF the tag is irrelevant, it would display an image that is not queried by the user. An image as such cannot be searched for.

2) Searching for documents with similar meaning. THough LSI can be used , cheaper methods ??

3) A very relevant answer is expected for a query rather than displaying relevant documents containing the answers. An interactive search engine that intelligent to answer and of your queries . Imagine how it would be !!!

Bhushan said...

Hello,

It might seem a bit funny and weird, but I wanted to know whether animals can understand human language.Pet animals generally obey the orders of their owners, so i just thought of asking the query "Animals understand human language".

Second, I wanted to know which is "the heaviest known object on the earth", but google was not able to come up with a proper result.

Finally, I wanted to get an idea of a person who can walk on his hands..kind of an acrobatic feat..!!So, I queried " How does man walk on hands"

Shankar said...

1. Why search? .... news.google.com is a great aggregator site.
But why should I take the trouble of visiting news sites.
Why not aggregate all the news which "I would find interesting" and show it to me.
Of course this would require user profiling. But I would not mind it if its sufficiently secure.

2. Video search: Search for the scenes in all the videos containing a specific person.

3. Natural language search and aggregating data from many sites:
"Tell me the cheapest way to travel from Sunnyvale to Phoenix in under 6 hours."
Currently search engines would just give links to sites selling flight and bus tickets.


PS: One of the comments mentioned searching a person based on a photo. This is supported by
riya.com where other photos of the same person can be searched.

snehith said...

1) There is no provision for topical search.The results to a query, especially the top few ones's are not always the things i am looking for.
2)None of the search engines, including google, does not have the provision wherein I can input an image file and search for similar images or descriptions for that image
3)None of the search engines provide the provision for personalized search engines just like the personalized web pages which keep a track and cater to the users specific needs and search patterns.

Newton Alex said...

Here are my queries

1. Professors between 10 to 15 years of experience in ASU
-- This one is more like a database query focussing on accuracy (precission). The search engines of future should posses such quality.

Doctors in Tempe region preferably Paediatricians available between 6pm and 8pm
-- This query requires the search engines to find out a lot of information, integrate them and finally present only the relevant results.

Hotels that are located exactly between 2 and 5 miles from the airport
-- This again is focused on accuracy of the result

Aravind Krishna K said...

1. Precisely speaking, looks like lot of people expect the search should try to imitate the database style of more structured-querying,
but type the query in natural language, rather than SQL. To achieve this, the engine should try to bring order to the web, by identifying objects/entities, and relationships like the database style. (efforts are going on in this direction ike 'DBlife'/'ExDB' etc, but making it work a very large scale level and generic is the an outstanding challenge!)

i.e., the searchengine should act like a human assistant, understand the intention,
and fetch appropriate results.


2. Searching these:

- Math equations on the web
- Semantic multimedia search

3. I somehow was always confused and lost trust in the summaries below links google provides,and end up actually in glancing the page myself for a quick overview if its relevant.

Example: Try searching "Kambhampati" in google, and see what summary it gives below dr.rao's link.

"Ph.D. Fulton Fellow for 2006-07; University Graduate Scholar 2006-07. ...",

it looks like description of a student (which actually is some way down under the page.. among a bunch of students) Though the instructor clearly has "Professor" in big font under his name, what makes google display this random text.. ??

Thanks,
Aravind
--

Aditya Kanitkar said...

1) Links of live events like cricket, soccer etc can never be found by search engines.

2) As search engines no not understand Natural Language, we are forced to express queres as keywords rather than questions(which would be easier for us).

3) Search engines do not handle queries in other languages efficiently

k.r.a.k.t.i.k said...

1. Can blogs "argue" with us?

Kind of like the saree-shop salesperson example, where you don't want the search engine to return results based on "here is what I think you are looking for" ... rather, you want it to be able to point out why it chose a particular result ahead of another.

2. Searching the search engine corpus of user conducted searches:

I daresay this is easy to implement, but it would be nice to search ON the searches that are being performed.

3. That tune in my head ...:

I'd love to have a search engine that could get hold of this tune that's been in my head since the morning, and supply me with complete information on the song / tune. Ditto with videos.