|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectxsmeral.semnet.crawler.util.RobotsPolicy
public class RobotsPolicy
Represents site crawling policy defined in
Robots Exclusion Protocol
for one host.
Provides methods for checking URI against the policy.
This implementation allows non-standard, however, widely used extensions
Allow
, Crawl-delay
and wildcards in URIs.
The parser is lenient, ignoring non-matching lines and unknown fields.
A more specific rule overrides a less specific rule (if a rule exists for
one specific user agent, it overrides the rule for *).
Constructor Summary | |
---|---|
RobotsPolicy(URL host,
String userAgent)
Calls load for the specified host and user agent. |
Method Summary | |
---|---|
boolean |
allows(String relativeUri)
Checks whether this relative URI is allowed in this host's robots policy |
boolean |
allowsAll()
Checks whether this policy allows all URLs for this user-agent |
boolean |
disallows(String relativeUri)
Complementary to allows |
float |
getCrawlDelay()
Returns the crawl delay in seconds. |
int |
getCrawlDelayMillis()
Returns the crawl delay in milliseconds |
void |
load(URL host)
Tries to load robots.txt at the specified host. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public RobotsPolicy(URL host, String userAgent)
load
for the specified host and user agent.
host
- The host to get the policy foruserAgent
- User agent, rules for which are searchedMethod Detail |
---|
public final void load(URL host)
host
- The host to load the policy frompublic boolean allows(String relativeUri)
relativeUri
- An URI relative to the host
public boolean disallows(String relativeUri)
allows
relativeUri
- An URI relative to the host
public boolean allowsAll()
public float getCrawlDelay()
public int getCrawlDelayMillis()
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |