opendap.crawler
Class URLClassifier

java.lang.Object
  extended by opendap.crawler.URLClassifier

public class URLClassifier
extends java.lang.Object

Grovel over a bunch of DDX URLs and group them using equivalence classes.

Author:
jimg

Constructor Summary
URLClassifier(java.lang.String cacheName, boolean readOnly)
           
 
Method Summary
 int assignUrlsToInitialGroups(java.util.Enumeration<java.lang.String> ddxURLs)
          For all of the URLs, assign them to an initial set of URLGroups.
 int classifyURLs(java.io.PrintStream ps)
           
 void lookForDates()
          Look for dates in the equivalence classes that define each group
static void main(java.lang.String[] args)
           
 void printClassifications(java.io.PrintStream ps)
          Print simple information about a classification.
 void printCompleteClassifications(java.io.PrintStream ps)
          Print more information about each classification.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

URLClassifier

public URLClassifier(java.lang.String cacheName,
                     boolean readOnly)
              throws java.lang.Exception
Throws:
java.lang.Exception
Method Detail

main

public static void main(java.lang.String[] args)

classifyURLs

public int classifyURLs(java.io.PrintStream ps)
                 throws java.lang.Exception
Throws:
java.lang.Exception

assignUrlsToInitialGroups

public int assignUrlsToInitialGroups(java.util.Enumeration<java.lang.String> ddxURLs)
                              throws java.lang.Exception
For all of the URLs, assign them to an initial set of URLGroups.

Returns:
The number of URLs processed
Throws:
java.lang.Exception - If the URLComponents object cannot be built

lookForDates

public void lookForDates()
                  throws java.lang.Exception
Look for dates in the equivalence classes that define each group

Throws:
java.lang.Exception

printClassifications

public void printClassifications(java.io.PrintStream ps)
Print simple information about a classification. Foreach group, print the path components used to form the equivalence classes.

Parameters:
ps -

printCompleteClassifications

public void printCompleteClassifications(java.io.PrintStream ps)
Print more information about each classification. This sorts the URLs in the group and prints the first and last ones. It also prints histogram information for each equivalence used to form the group.

Parameters:
ps -