Friday, February 2, 2007

project 1

Hi,
How is the project 1 going on? Any difficulties in understanding the code?
 
I wanted to let you know something regarding the structure of the index.
 
You need to read the entire index for pre computing norms of the documents.
The content of the index contains terms of the type: "contents", "title", "modified", "uid" etc. But you need to consider only the "contents". You can find out the type of the term using termval.field() as given in the following code which is from VectorViewer.java.
 
while(termenum.next())
     {
   count++;
   Term termval = termenum.term();
   System.out.println("Type: " + termval.field() + "  The Term :" + termval.text() + " Frequency :"+termenum.docFreq());
   /**
     Add following here to
     retrieve the <docNo,Freq> pair for each term call
      TermDocs termdocs = reader.termDocs(termval);

    to retrieve the <docNo,Freq,<pos1,......posn>> call
      TermPositions termpositions = termval.termPositions(termval)

   **/

     }
 
Bhaumik

No comments: