A B C D E F G H I J L M N O P Q R S T U V W X

G

getAttached() - Method in class xsmeral.semnet.manager.ProcessingJob
Returns collection of attached processors.
getBaseURL() - Method in class xsmeral.semnet.crawler.model.EntityDocument
Returns the base URL of the host, where this document originated.
getBaseURL() - Method in class xsmeral.semnet.crawler.model.HostDescriptor
Returns base URL of this host - the root level for crawling.
getCharset() - Method in class xsmeral.semnet.crawler.model.HostDescriptor
Returns the (user-defined) charset used by this host.
getClazz() - Method in class xsmeral.semnet.manager.Configuration
Returns class of the object processor.
getConnection() - Method in class xsmeral.semnet.crawler.RDBLayer
Returns a new connection to the database.
getConnection(URL) - Static method in class xsmeral.semnet.crawler.util.ConnectionManager
Returns a HttpUrlConnection set up with the defined settings.
getConnection() - Method in class xsmeral.semnet.query.QueryInterface
Returns a connection to the underlying repository.
getConnTimeout() - Static method in class xsmeral.semnet.crawler.util.ConnectionManager
Corresponds to URLConnection.getConnectTimeout()
getCrawlDelay() - Method in class xsmeral.semnet.crawler.model.HostDescriptor
Returns the crawl delay.
getCrawlDelay() - Method in class xsmeral.semnet.crawler.util.RobotsPolicy
Returns the crawl delay in seconds.
getCrawlDelayMillis() - Method in class xsmeral.semnet.crawler.util.RobotsPolicy
Returns the crawl delay in milliseconds
getDBLayer() - Method in class xsmeral.semnet.crawler.model.CrawlerConfiguration
The relational DB layer used by the crawler for state persistence (URL storage)
getDescription() - Method in class xsmeral.semnet.manager.ProcessingJob
Returns description of the job.
getDescriptors() - Method in class xsmeral.semnet.crawler.HostManager
Returns descriptors of the managed hosts.
getDocument() - Method in class xsmeral.semnet.crawler.model.EntityDocument
Returns the TagNode (HtmlCleaner) containing the document tree.
getDriver() - Method in class xsmeral.semnet.crawler.RDBLayer
 
getEntityDescriptor(int, String) - Method in class xsmeral.semnet.crawler.HostManager
Returns the EntityDescriptor associated with this URL pattern or null if the pattern does not represent an entity in the provided host
getEntityDescriptor() - Method in class xsmeral.semnet.crawler.model.EntityDocument
Returns the entity descriptor describing this document.
getEntityDescriptorMap(int) - Method in class xsmeral.semnet.crawler.HostManager
Returns the map between entity URL patterns and their entity descriptors for the host with given ID, or null if the ID is not mapped to any managed host.
getEntityDescriptors() - Method in class xsmeral.semnet.crawler.model.HostDescriptor
Returns EntityDescriptors which represent entities in this host (pages that will be scraped)
getEntry(AssociationRole, String) - Method in class xsmeral.semnet.mapper.Mapping
 
getGlobalCrawlDelayMinimum() - Method in class xsmeral.semnet.crawler.HTMLCrawler
 
getGlobalCrawlDelayMinimum() - Method in class xsmeral.semnet.crawler.model.CrawlerConfiguration
Minimal crawl delay in milliseconds
getHost() - Method in class xsmeral.semnet.crawler.model.URLEntry
Returns host path of the URL.
getHostDesc() - Method in class xsmeral.semnet.crawler.model.EntityDescriptor
Returns the owning HostDescriptor.
getHostDescriptor(int) - Method in class xsmeral.semnet.crawler.HostManager
Returns descriptor of the host associated with the given ID or null if the ID is not mapped to any managed host.
getHostId(String) - Method in class xsmeral.semnet.crawler.HostManager.Mapper
 
getHostId(String) - Method in interface xsmeral.semnet.crawler.HostMapper
Returns ID of the host with the specified URL.
getHostIds() - Method in class xsmeral.semnet.crawler.HostManager
Returns IDs of the managed hosts.
getHostName(int) - Method in class xsmeral.semnet.crawler.HostManager.Mapper
 
getHostName(int) - Method in interface xsmeral.semnet.crawler.HostMapper
Returns host name for the specified ID.
getHosts() - Method in class xsmeral.semnet.crawler.model.CrawlerConfiguration
Hosts crawled by the crawler
getId() - Method in class xsmeral.semnet.crawler.model.URLEntry
Returns ID (generated by DB).
getInputStream(URL, int, String) - Static method in class xsmeral.semnet.crawler.util.ConnectionManager
Returns an InputStream to the given URL, possibly retrying the connection.
getInputStream(URL, int) - Static method in class xsmeral.semnet.crawler.util.ConnectionManager
Returns an InputStream to the given URL, possibly retrying the connection.
getInputStream(URL) - Static method in class xsmeral.semnet.crawler.util.ConnectionManager
Returns an InputStream to the given URL.
getJob() - Method in class xsmeral.semnet.manager.JobRunner
Returns the supplied processing job.
getLastVisited() - Method in class xsmeral.semnet.crawler.model.URLEntry
Returns Date indicating when this URL has been last visited by the crawler.
getMap() - Method in class xsmeral.semnet.mapper.Mapping
 
getMapper(RDBLayer) - Static method in class xsmeral.semnet.crawler.HostManager
Returns a mapper instance for the specified DB.
getMapper() - Method in class xsmeral.semnet.crawler.HostManager
Returns the mapper instance associated with this HostManager.
getMapping() - Method in class xsmeral.semnet.mapper.StatementMapper
 
getName() - Method in class xsmeral.semnet.crawler.model.HostDescriptor
Returns (arbitrary, user-assigned) name of this host.
getName() - Method in class xsmeral.semnet.manager.ProcessingJob
Returns name of the job.
getNamespace() - Method in class xsmeral.semnet.scraper.AbstractScraper
Returns namespace used by this scraper.
getParams() - Method in class xsmeral.semnet.manager.Configuration
Returns parameter map, that is used to initialize the processor.
getPassword() - Method in class xsmeral.semnet.crawler.RDBLayer
 
getPath() - Method in class xsmeral.semnet.crawler.model.URLEntry
Returns path part of the URL, relative to the host name.
getPattern(int, String) - Method in class xsmeral.semnet.crawler.HostManager
Returns the Pattern (entity or source) that matches the given relative URL or null if no match is found.
getPattern() - Method in class xsmeral.semnet.crawler.model.URLEntry
Returns a regex pattern (as a string) that matches path of this URL and identifies the corresponding EntityDescriptor.
getPipe() - Method in class xsmeral.semnet.manager.JobRunner
Returns the Pipe created from processors in the supplied job.
getProcessorChain() - Method in class xsmeral.semnet.manager.ProcessingJob
Returns list of object processors.
getProperties() - Method in class xsmeral.semnet.sink.RepositoryFactory
Returns the initialization Properties.
getQuery() - Method in interface xsmeral.semnet.crawler.URLManager.Query
Constructs the query.
getQuery() - Method in class xsmeral.semnet.crawler.URLManager.QueryBuilderImpl
 
getQueryForHost(int) - Method in class xsmeral.semnet.crawler.URLManager
Returns a Query for host with given ID.
getReadTimeout() - Static method in class xsmeral.semnet.crawler.util.ConnectionManager
Corresponds to URLConnection.getReadTimeout()
getRepository() - Method in class xsmeral.semnet.query.QueryInterface
Returns the underlying repository.
getRepository() - Method in class xsmeral.semnet.sink.RepositoryFactory
Should be called only after initialization and return initialized repository.
getSchema() - Method in class xsmeral.semnet.crawler.RDBLayer
 
getScore() - Method in class xsmeral.semnet.crawler.model.URLEntry
Returns a number indicating likelihood of the URL to work.
getScrapers() - Method in class xsmeral.semnet.crawler.model.EntityDescriptor
Returns the scraper classes that processes this entity type.
getSourceURLMap(int) - Method in class xsmeral.semnet.crawler.HostManager
Returns the map between source URL patterns and their update frequencies for the host with given ID, or null if the ID is not mapped to any managed host.
getSourceURLPatterns() - Method in class xsmeral.semnet.crawler.model.HostDescriptor
Returns patterns of source URLs mapped to corresponding update frequencies.
getStatement() - Method in interface xsmeral.semnet.crawler.URLManager.Query
Returns the constructed SQL prepared statement.
getStatement() - Method in class xsmeral.semnet.crawler.URLManager.QueryBuilderImpl
 
getText(Object) - Static method in class xsmeral.semnet.util.XPathUtil
Returns text content of a node.
getThreadsPerHost() - Method in class xsmeral.semnet.crawler.model.CrawlerConfiguration
Number of crawling threads per host
getUpdateFreq() - Method in class xsmeral.semnet.crawler.model.EntityDescriptor
Returns the update frequency for this entity type, in seconds.
getUpdateFreq() - Method in class xsmeral.semnet.crawler.model.URLEntry
Returns update frequency in seconds.
getUrl() - Method in class xsmeral.semnet.crawler.model.EntityDocument
Returns absolute URL of the document.
getUrl() - Method in class xsmeral.semnet.crawler.model.URLEntry
Returns the full URL.
getURL() - Method in class xsmeral.semnet.crawler.RDBLayer
 
getUrlPattern() - Method in class xsmeral.semnet.crawler.model.EntityDescriptor
Returns the URL pattern that identifies this entity type.
getUser() - Method in class xsmeral.semnet.crawler.RDBLayer
 
getUserAgent() - Static method in class xsmeral.semnet.crawler.util.ConnectionManager
Returns the User-agent.
getValueFactory() - Method in class xsmeral.semnet.query.QueryInterface
Returns a value factory instance.
getValueFactory() - Static method in class xsmeral.semnet.scraper.AbstractScraper
Returns a ValueFactory (instantiated at initialization).
getVisitCount() - Method in class xsmeral.semnet.crawler.model.URLEntry
Returns number of times this URL has been visited by the crawler.
getVocabulary() - Method in class xsmeral.semnet.scraper.AbstractScraper
Returns map of all terms and their definitions in this scraper's vocabulary (fields of type URI annotated with Term).
getWeight() - Method in class xsmeral.semnet.crawler.model.EntityDescriptor
Returns the "weight" of this entity, a measue of preference.

A B C D E F G H I J L M N O P Q R S T U V W X