xsmeral.semnet.util
Class URLUtil

java.lang.Object
  extended by xsmeral.semnet.util.URLUtil

public class URLUtil
extends Object

Provides methods for URL normalization and host equality comparison.


Constructor Summary
URLUtil()
           
 
Method Summary
static boolean equalHosts(URL host1, URL host2, boolean full)
           
static String fullHost(URL url)
          Returns scheme and authority part of URL with trailing slash.
static URL normalize(String url)
          Convenience method, calls normalize(new URL(url)).
static URL normalize(URL url)
          Important part of every crawler - a URL normalizer.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

URLUtil

public URLUtil()
Method Detail

fullHost

public static String fullHost(URL url)
Returns scheme and authority part of URL with trailing slash.


equalHosts

public static boolean equalHosts(URL host1,
                                 URL host2,
                                 boolean full)

normalize

public static URL normalize(String url)
                     throws MalformedURLException
Convenience method, calls normalize(new URL(url)). Tries to add http:// if it's missing

Parameters:
url - String containing the url to be normalized
Returns:
Normalized URL, or the supplied string unchanged in case of failure
Throws:
MalformedURLException

normalize

public static URL normalize(URL url)
Important part of every crawler - a URL normalizer. Ensures equivalence of different representations of the same URL.
Adheres mostly to RFC 3986 and http://dblab.ssu.ac.kr/publication/LeKi05a.pdf
Performs these steps: