xsmeral.semnet.crawler
Class HostManager

java.lang.Object
  extended by xsmeral.semnet.crawler.HostManager

public class HostManager
extends Object

Host manager for HTMLCrawler. Manages HostDescriptors, EntityDescriptors, mapping of hosts to their IDs and persisting hosts in DB.
Mapping hosts to IDs is mainly a performance measure.


Nested Class Summary
static class HostManager.Mapper
          Mapper is responsible for mapping hosts to IDs.
 
Constructor Summary
HostManager(RDBLayer db)
          Creates manager instance for the specified DB layer.
 
Method Summary
 int addHost(String address)
          Adds the host with the specified address to the DB.
 void close()
          Closes the DB connection
 Collection<HostDescriptor> getDescriptors()
          Returns descriptors of the managed hosts.
 EntityDescriptor getEntityDescriptor(int hostId, String pattern)
          Returns the EntityDescriptor associated with this URL pattern or null if the pattern does not represent an entity in the provided host
 Map<Pattern,EntityDescriptor> getEntityDescriptorMap(int hostId)
          Returns the map between entity URL patterns and their entity descriptors for the host with given ID, or null if the ID is not mapped to any managed host.
 HostDescriptor getHostDescriptor(int id)
          Returns descriptor of the host associated with the given ID or null if the ID is not mapped to any managed host.
 Collection<Integer> getHostIds()
          Returns IDs of the managed hosts.
 HostMapper getMapper()
          Returns the mapper instance associated with this HostManager.
static HostMapper getMapper(RDBLayer db)
          Returns a mapper instance for the specified DB.
 Pattern getPattern(int hostId, String relativeUrl)
          Returns the Pattern (entity or source) that matches the given relative URL or null if no match is found.
 Map<Pattern,Integer> getSourceURLMap(int hostId)
          Returns the map between source URL patterns and their update frequencies for the host with given ID, or null if the ID is not mapped to any managed host.
 boolean isEntity(int hostId, Pattern pattern)
          Indicates, whether the specified pattern represents an entity in the given host.
 boolean isSource(int hostId, Pattern pattern)
          Indicates, whether the specified pattern represents a source URL in the given host.
 void loadHosts(Collection<HostDescriptor> hosts)
          Initializes the manager with given set of hosts.
static void main(String[] args)
          Provides a CLI for simple management of hosts.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

HostManager

public HostManager(RDBLayer db)
            throws SQLException
Creates manager instance for the specified DB layer.

Throws:
SQLException - In case of a problem with the DB layer.
Method Detail

loadHosts

public void loadHosts(Collection<HostDescriptor> hosts)
               throws SQLException
Initializes the manager with given set of hosts. If a host in a descriptor is not yet managed, it is added to the DB. (Which means it only does one way synchronization: Descriptors -> DB).

Throws:
SQLException - In case of a problem with the DB layer.

getMapper

public static HostMapper getMapper(RDBLayer db)
Returns a mapper instance for the specified DB.

See Also:
HostManager.Mapper(db)

getMapper

public HostMapper getMapper()
Returns the mapper instance associated with this HostManager.


addHost

public final int addHost(String address)
                  throws SQLException
Adds the host with the specified address to the DB. Returns newly generated ID of the host.

Parameters:
address - String containing the URL of the host
Returns:
Generated ID of the added host or 0 if the host hasn't been added
Throws:
SQLException - If a SQL command fails

getDescriptors

public Collection<HostDescriptor> getDescriptors()
Returns descriptors of the managed hosts.


getHostIds

public Collection<Integer> getHostIds()
Returns IDs of the managed hosts.


getHostDescriptor

public HostDescriptor getHostDescriptor(int id)
Returns descriptor of the host associated with the given ID or null if the ID is not mapped to any managed host.


getSourceURLMap

public Map<Pattern,Integer> getSourceURLMap(int hostId)
Returns the map between source URL patterns and their update frequencies for the host with given ID, or null if the ID is not mapped to any managed host.

See Also:
HostDescriptor

getEntityDescriptorMap

public Map<Pattern,EntityDescriptor> getEntityDescriptorMap(int hostId)
Returns the map between entity URL patterns and their entity descriptors for the host with given ID, or null if the ID is not mapped to any managed host.


getPattern

public Pattern getPattern(int hostId,
                          String relativeUrl)
Returns the Pattern (entity or source) that matches the given relative URL or null if no match is found.

Parameters:
hostId - ID of the host to search for patterns.
relativeUrl - URL to match
See Also:
HostDescriptor, EntityDescriptor

isEntity

public boolean isEntity(int hostId,
                        Pattern pattern)
Indicates, whether the specified pattern represents an entity in the given host.

Parameters:
hostId - ID of the given host
pattern - The pattern to check
See Also:
EntityDescriptor

isSource

public boolean isSource(int hostId,
                        Pattern pattern)
Indicates, whether the specified pattern represents a source URL in the given host.

Parameters:
hostId - ID of the given host
pattern - The pattern to check
See Also:
HostDescriptor

getEntityDescriptor

public EntityDescriptor getEntityDescriptor(int hostId,
                                            String pattern)
Returns the EntityDescriptor associated with this URL pattern or null if the pattern does not represent an entity in the provided host

Parameters:
hostId - ID of the host
pattern - The URL pattern

close

public void close()
Closes the DB connection

See Also:
RDBLayer

main

public static void main(String[] args)
Provides a CLI for simple management of hosts. Available methods are list,remove and reset. More details are provided upon running the class without arguments.