Hi,
How is the project 1 going on? Any difficulties in understanding the code?
I wanted to let you know something regarding the structure of the index.
You need to read the entire index for pre computing norms of the documents.
The content of the index contains terms of the type: "contents", "title", "modified", "uid" etc. But you need to consider only the "contents". You can find out the type of the term using termval.field() as given in the following code which is from VectorViewer.java.
while(termenum.next())
{
count++;
Term termval = termenum.term();
System.out.println("Type: " + termval.field() + " The Term :" + termval.text() + " Frequency :"+termenum.docFreq());
/**
Add following here to
retrieve the <docNo,Freq> pair for each term call
TermDocs termdocs = reader.termDocs(termval);
{
count++;
Term termval = termenum.term();
System.out.println("Type: " + termval.field() + " The Term :" + termval.text() + " Frequency :"+termenum.docFreq());
/**
Add following here to
retrieve the <docNo,Freq> pair for each term call
TermDocs termdocs = reader.termDocs(termval);
to retrieve the <docNo,Freq,<pos1,......posn>> call
TermPositions termpositions = termval.termPositions(termval)
**/
}
Bhaumik
No comments:
Post a Comment