CSE494/598 Spring 2007 Blog: project 1

Friday, February 2, 2007

project 1

Hi,

How is the project 1 going on? Any difficulties in understanding the code?

I wanted to let you know something regarding the structure of the index.

You need to read the entire index for pre computing norms of the documents.

The content of the index contains terms of the type: "contents", "title", "modified", "uid" etc. But you need to consider only the "contents". You can find out the type of the term using termval.field() as given in the following code which is from VectorViewer.java.

while(termenum.next())
     {
   count++;
Term termval = termenum.term();
   System.out.println("Type: " + termval.field() + " The Term :" + termval.text() + " Frequency :"+termenum.docFreq());
   /**
     Add following here to
     retrieve the <docNo,Freq> pair for each term call
      TermDocs termdocs = reader.termDocs(termval);

to retrieve the <docNo,Freq,<pos1,......posn>> call
TermPositions termpositions = termval.termPositions(termval)

**/

}

Bhaumik

CSE494/598 Spring 2007 Blog

Friday, February 2, 2007

project 1

No comments:

Blog for ASU Course: Information Retrieval/Mining/Integration

Blog Archive

Contributors