org.niocchi.urlpools
Class TimeoutURLPool
java.lang.Object
org.niocchi.urlpools.TimeoutURLPool
- All Implemented Interfaces:
- URLPool
public class TimeoutURLPool
- extends java.lang.Object
- implements URLPool
this class is an URLPool wrapper that drops all subsequent Queries
from hosts that have reached too many consecutive timeouts.
Implementation detail: the dropQuery methods calls
_url_pool.setProcessed with a singleton Resource by assuming
_url_pool will not store it.
Methods inherited from class java.lang.Object |
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
TimeoutURLPool
public TimeoutURLPool(URLPool pool_)
hasNextQuery
public boolean hasNextQuery()
- Specified by:
hasNextQuery
in interface URLPool
- Returns:
- true if there is at least one query to crawl (i.e.
getNextQuery will return a query at some point), false
if no more queries to crawl (a call to getNextQuery
will throw an URLPoolException).
getNextQuery
public Query getNextQuery()
throws URLPoolException
- Specified by:
getNextQuery
in interface URLPool
- Returns:
- null if no query is yet available, otherwise return a
Query. Throws URLPoolException is no query will be
available (a call to hasNextQuery() must returns
false).
- Throws:
URLPoolException
setProcessed
public void setProcessed(Query query)
- Description copied from interface:
URLPool
- This method is called by the crawler when the query has been
processed. That gives the oportunity to the url pool to
implement specific behaviors (for instance, send back the query
to the crawler if it got a timeout.
- Specified by:
setProcessed
in interface URLPool
setMaxConsecutiveTimeouts
public void setMaxConsecutiveTimeouts(int max_)