xsmeral.semnet.crawler
Class URLManager

java.lang.Object
  extended by xsmeral.semnet.crawler.URLManager

public class URLManager
extends Object

URL Manager for HTMLCrawler. Responsible for persistence of URLs. Contains methods for querying, locking, updating, adding.

See Also:
HTMLCrawler

Nested Class Summary
static interface URLManager.LimitClause
          The LIMIT clause of SQL statement.
static interface URLManager.OrderClause
          The ORDER BY clause of SQL statement.
static interface URLManager.Query
          Complete SQL query.
static interface URLManager.QueryBuilder
          Builder of queries.
 class URLManager.QueryBuilderImpl
          Implementation of QueryBuilder for URL entries.
static interface URLManager.WhereClause
          The WHERE clause of SQL statement.
 
Constructor Summary
URLManager(RDBLayer db)
          Creates an instance for the specified DB layer.
 
Method Summary
 int addEntries(Collection<URLEntry> entries)
          Adds given entries to DB.
 boolean addEntry(URLEntry entry)
          Adds given entry to DB.
 void close()
          Closes the DB connection and all prepared statements.
 Collection<URLEntry> fetchEntries(URLManager.Query q, int ownerId)
          Retrieves URLs based on the given query.
 URLManager.WhereClause getQueryForHost(int hostId)
          Returns a Query for host with given ID.
 Collection<URLEntry> listBroken()
          Returns list of URL marked as not working.
 Collection<URLEntry> listLocked()
          Returns list of locked entries.
 void returnEntry(URLEntry entry)
          Updates and unlocks the given entry in DB.
 boolean unlockAll()
          Unlocks all locked URLs.
 boolean unlockUrl(URLEntry entry)
          Unlocks the URL specified by given entry.
 boolean unlockUrls(Collection<URLEntry> entries)
          See unlockUrl
 void updateEntry(URLEntry entry)
          Updates the given entry in DB.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

URLManager

public URLManager(RDBLayer db)
           throws SQLException
Creates an instance for the specified DB layer.

Throws:
SQLException - If an error occurs when connecting to DB or preparing statements.
Method Detail

getQueryForHost

public URLManager.WhereClause getQueryForHost(int hostId)
Returns a Query for host with given ID.


fetchEntries

public Collection<URLEntry> fetchEntries(URLManager.Query q,
                                         int ownerId)
Retrieves URLs based on the given query. Locks the entries so that no other thread can get the same URLs at the same time. Therefore every URL retrieved should also be returned (and thusly unlocked) by calling returnEntry(entry).

Parameters:
q - The Query to use
ownerId - An identification of the entity that is retrieving and locking this URL

returnEntry

public void returnEntry(URLEntry entry)
Updates and unlocks the given entry in DB.


addEntry

public boolean addEntry(URLEntry entry)
Adds given entry to DB.

Returns:
True, if the entry was added (did not exist).
See Also:
addEntries(java.util.Collection)

addEntries

public int addEntries(Collection<URLEntry> entries)
Adds given entries to DB. Any entries already present in DB are ignored.

Returns:
Number of modified rows (added entries).

updateEntry

public void updateEntry(URLEntry entry)
Updates the given entry in DB.


close

public void close()
Closes the DB connection and all prepared statements.


unlockUrl

public boolean unlockUrl(URLEntry entry)
Unlocks the URL specified by given entry.

Returns:
True, if the URL was successfully unlocked

unlockUrls

public boolean unlockUrls(Collection<URLEntry> entries)
See unlockUrl


listLocked

public Collection<URLEntry> listLocked()
Returns list of locked entries.


listBroken

public Collection<URLEntry> listBroken()
Returns list of URL marked as not working.


unlockAll

public boolean unlockAll()
Unlocks all locked URLs. Should only be called in case of a crash, where not all locked URLs have been unlocked.

Returns:
True, if all URLs have been successfully unlocked
See Also:
lockUrl, unlockUrl