opendap.crawler
Class ThreddsCatalogUtil

java.lang.Object
  extended by opendap.crawler.ThreddsCatalogUtil

public class ThreddsCatalogUtil
extends java.lang.Object


Nested Class Summary
static class ThreddsCatalogUtil.SERVICE
           
 class ThreddsCatalogUtil.ThreddsCrawlerEnumeration
          Implements a modified depth-first traversal of a thredds catalog.
 
Constructor Summary
ThreddsCatalogUtil(java.lang.String namePrefix, boolean readOnly)
          Constructor.
 
Method Summary
 java.lang.String getCachedCatalog(java.lang.String url)
          Return the THREDDS catalog associated with the given URL from the local cache.
 java.util.Enumeration<java.lang.String> getCachedCatalogEnumeration()
          Get access to all of the THREDDS Catalogs in the cache.
 ThreddsCatalogUtil.ThreddsCrawlerEnumeration getCatalogEnumeration()
          Resume an interrupted crawl.
 ThreddsCatalogUtil.ThreddsCrawlerEnumeration getCatalogEnumeration(java.lang.String topCatalog)
          Crawl a thredds catalog.
 java.util.Vector<java.lang.String> getCatalogRefURLs(java.lang.String catalogUrlString)
          Returns all of the THREDDS catalog URLs contained in the THREDDS catalog located at the passed URL.
 java.util.Vector<java.lang.String> getDDXUrls(java.lang.String catalogUrlString)
          Return all of the DDX urls to data sources referenced by the given thredds catalog.
 void saveCatalogCache()
          Save the 'visited' cache.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

ThreddsCatalogUtil

public ThreddsCatalogUtil(java.lang.String namePrefix,
                          boolean readOnly)
                   throws java.lang.Exception
Constructor. This constructor gives the finest control over the caching operations performed. Because some sites use lots of catalogs, it might require lots of space to cache the entire catalog. However, it would still be nice to know about (or avoid) loops!

Parameters:
writeToCache - True if caching should be used
namePrefix - The name of the cache files
readFromCache - Arrange for the TCU class to read Thredds catalogs from the postgres cache.
Throws:
java.lang.Exception
Method Detail

getDDXUrls

public java.util.Vector<java.lang.String> getDDXUrls(java.lang.String catalogUrlString)
                                              throws java.lang.Exception
Return all of the DDX urls to data sources referenced by the given thredds catalog. The thredds catalog is referenced using a URL which either be accessed or read from a the cache, depending on how the instance of TCU was built.

Parameters:
catalogUrlString - The THREDDS catalog to access
Returns:
A Vector of strings, each element a DDX URL.
Throws:
java.lang.Exception

getCatalogRefURLs

public java.util.Vector<java.lang.String> getCatalogRefURLs(java.lang.String catalogUrlString)
                                                     throws java.lang.Exception
Returns all of the THREDDS catalog URLs contained in the THREDDS catalog located at the passed URL.

Parameters:
catalogUrlString - The URL from where the catalog was retrieved.
Returns:
A vector of fully qualified URL Strings each of which points to a THREDDS catalog document. If the catalog returned by dereferencing catalogUrlString is 'bad' (e.g., the server returns a 404 response), then the Vector result will be empty.
Throws:
java.lang.Exception

getCatalogEnumeration

public ThreddsCatalogUtil.ThreddsCrawlerEnumeration getCatalogEnumeration(java.lang.String topCatalog)
                                                                   throws java.lang.Exception
Crawl a thredds catalog. This implements a modified depth-first traversal of the 'tree' of catalogs with 'topCatalog' as the root. In reality, a thredds catalog is a directed graph but the Enumeration returned is smart enough to avoid loops, so the resulting traversal has a tree-like feel. The algorithm is like a depth-first traversal but it has been modified so that HTTP accesses are limited to one per call to nextElement(). When nextElement is called, its value (call it 'C') is both returned and crawled so that subsequent calls return the children of 'C'. A real depth-first traversal would descend all the way to the leaf nodes - thredds catalogs that contain only references to data set and not other catalogs.

Parameters:
topCatalog - The THREDDS catalog that will serve as the root node
Returns:
An Enumeration of Strings that will visit all of the catalogs in that tree, bound up in a ThreddsCrawlerEnumeration.
Throws:
java.lang.Exception - Thrown if the cache cannot be configured

getCatalogEnumeration

public ThreddsCatalogUtil.ThreddsCrawlerEnumeration getCatalogEnumeration()
                                                                   throws java.lang.Exception
Resume an interrupted crawl. Suppose that the previous crawl ended with some elements still on the stack of catalogs to be visited - then that stack will be saved and used as a starting point when this method is used. This provides some protection in case the network fails during a long crawl.

Returns:
An enumeration of Strings, bound up in a ThreddsCrawlerEnumeration object.
Throws:
java.lang.Exception - Thrown if the cache cannot be configured

getCachedCatalogEnumeration

public java.util.Enumeration<java.lang.String> getCachedCatalogEnumeration()
Get access to all of the THREDDS Catalogs in the cache. Note that these URLs are returned in a random order, not the order in which they were added to the cache. Also note that they are not the actual URLs in the Postgres cache, but instead those URLs saved in the 'Visited' cache which is a separate collection of URLs maintained to eliminate looping during a crawl.

Returns:
An Enumeration of the THREDDS Catalog URLs crawled so far.

getCachedCatalog

public java.lang.String getCachedCatalog(java.lang.String url)
                                  throws java.lang.Exception
Return the THREDDS catalog associated with the given URL from the local cache.

Parameters:
url - Find this THREDDS catalog
Returns:
The THREDDS catalog
Throws:
java.lang.Exception

saveCatalogCache

public void saveCatalogCache()
                      throws java.lang.Exception
Save the 'visited' cache. This is actually a ConcurrentHashMap and holds the URL and the last time the URL was accesses\d.

Throws:
java.lang.Exception