A B C D E F G H I L M O P Q R S T U V W

A

addMonitored(Monitorable) - Method in class org.niocchi.monitor.Monitor
 
address_total_time - Variable in class org.niocchi.core.Crawler
 

B

Base64Coder - Class in org.niocchi.core
A Base64 Encoder/Decoder.

C

connection_total_time - Variable in class org.niocchi.core.Crawler
 
crawl() - Method in class org.niocchi.gc.GenericCrawler
 
Crawler - Class in org.niocchi.core
 
Crawler(ResourceFactoryInt, int) - Constructor for class org.niocchi.core.Crawler
Create a new Crawler instance.
createResource() - Method in class org.niocchi.core.DiskResourceFactory
Returns an instance of DiskResource.
createResource() - Method in class org.niocchi.core.MemoryResourceFactory
Returns an instance of MemoryResource.
createResource() - Method in interface org.niocchi.core.ResourceFactoryInt
 
createResource() - Method in class org.niocchi.gc.GenericResourceFactory
 

D

decode(String) - Static method in class org.niocchi.core.Base64Coder
Decodes a Base64 string.
decode(char[]) - Static method in class org.niocchi.core.Base64Coder
Decodes Base64 data.
DiskResource - Class in org.niocchi.core
A resource that saves its content (the response body) into a temporary file.
DiskResource() - Constructor for class org.niocchi.core.DiskResource
Resource constructor.
DiskResourceFactory - Class in org.niocchi.core
An implementation of the ResourceFactoryInt interface that creates instances of DiskResource.
DiskResourceFactory() - Constructor for class org.niocchi.core.DiskResourceFactory
 
dump() - Method in class org.niocchi.gc.GenericCrawler
 
dump() - Method in interface org.niocchi.monitor.Monitorable
 

E

encode(String) - Static method in class org.niocchi.core.Base64Coder
Encodes a string into Base64 format.
encode(byte[]) - Static method in class org.niocchi.core.Base64Coder
Encodes a byte array into Base64 format.
execute(String[]) - Method in class org.niocchi.gc.GenericCrawler
 

F

fromString(String) - Static method in enum org.niocchi.core.Query.Method
 

G

GenericCrawler - Class in org.niocchi.gc
A simple crawler given as an implementation example.
It uses a SimpleFileURLPool which reads URL from a file, one URL per line.
GenericCrawler() - Constructor for class org.niocchi.gc.GenericCrawler
 
GenericResource - Class in org.niocchi.gc
This class gives an example on how to implement the isValid method.
GenericResource() - Constructor for class org.niocchi.gc.GenericResource
 
GenericResourceFactory - Class in org.niocchi.gc
 
GenericResourceFactory() - Constructor for class org.niocchi.gc.GenericResourceFactory
 
GenericWorker - Class in org.niocchi.gc
A simple worker that saves the resource into a specific file.
GenericWorker(Crawler, String) - Constructor for class org.niocchi.gc.GenericWorker
 
getBody() - Method in class org.niocchi.core.MemoryResource
Return the body as an array of bytes.
getBodyLength() - Method in class org.niocchi.core.Resource
Return the message body length.
getBytes() - Method in class org.niocchi.core.MemoryResource
Legacy method.
getCapacity() - Static method in class org.niocchi.core.MemoryResource
Returns the current capacity.
getConnectionTimeout() - Method in class org.niocchi.core.Crawler
Return the current connection timeout.
getContentEncoding() - Method in class org.niocchi.core.Resource
Returns the content encoding, if any.
getContentMimeSubType() - Method in class org.niocchi.core.Resource
Returns the MIME subtype part of the content type.
getContentMimeType() - Method in class org.niocchi.core.Resource
Returns the MIME type part of the content type.
getContentType() - Method in class org.niocchi.core.Resource
Returns the content type.
getHeader(String) - Method in class org.niocchi.core.Resource
Returns the value associated to a specific HTTP header.
getHeaderNames() - Method in class org.niocchi.core.Resource
Returns an array with the HTTP header names.
getHost() - Method in class org.niocchi.core.Query
Returns the host part of the associated URL (returned by getURL())
getHTTPStatus() - Method in class org.niocchi.core.Query
Returns the HTTP status or _UNKNOWN_HTTP_STATUS if no available status.
getHTTPStatus() - Method in class org.niocchi.core.Resource
Returns the HTTP status.
getInetSocketAddress() - Method in class org.niocchi.core.Query
Returns the InetSocketAddress or null if the query is not resolved.
getMaxRedirections() - Static method in class org.niocchi.core.Query
 
getMaxRedirections() - Method in class org.niocchi.rc.HostRedirectionController
 
getMethod() - Method in class org.niocchi.core.Query
Returns the method.
getNextQuery() - Method in interface org.niocchi.core.URLPool
 
getNextQuery() - Method in class org.niocchi.urlpools.SimpleFileURLPool
 
getNextQuery() - Method in class org.niocchi.urlpools.SimpleListURLPool
 
getNextQuery() - Method in class org.niocchi.urlpools.TimeoutURLPool
 
getOriginalURL() - Method in class org.niocchi.core.Query
Returns the original URL (passed to the constructor).
getPostData() - Method in class org.niocchi.core.Query
Returns the data to send through POST.
getRawDataSize() - Method in class org.niocchi.core.Resource
Returns the size of the data in the response, before processing it.
getReadTimeout() - Method in class org.niocchi.core.Crawler
Return the current read timeout.
getRedirected() - Method in class org.niocchi.core.Query
 
getRedirectionController() - Method in class org.niocchi.core.Crawler
Return the current redirection filter that the crawler is using, Null if there isn't a redirection filter.
getResource() - Method in class org.niocchi.core.Query
Returns the resource associated to this query.
getSelectTimeout() - Method in class org.niocchi.core.Crawler
Returns the current select timeout.
getStatus() - Method in class org.niocchi.core.Query
Returns the query status.
getTmpFileAbsolutePath() - Method in class org.niocchi.core.DiskResource
Returns the absolute path of the tmp file that holds the body if some content has been crawled or null otherwise.
getURL() - Method in class org.niocchi.core.Query
Returns the last URL to be crawled (the original URL if no redirection was followed).
getUserAgent() - Method in class org.niocchi.core.Crawler
Returns the user agent.

H

hasNextQuery() - Method in interface org.niocchi.core.URLPool
 
hasNextQuery() - Method in class org.niocchi.urlpools.SimpleFileURLPool
 
hasNextQuery() - Method in class org.niocchi.urlpools.SimpleListURLPool
 
hasNextQuery() - Method in class org.niocchi.urlpools.TimeoutURLPool
 
hasReachedMaxRedirections() - Method in class org.niocchi.core.Query
 
headersToString() - Method in class org.niocchi.core.Resource
Utility function to display the headers
HostRedirectionController - Class in org.niocchi.rc
A simple controller that implements basic host name filter strategy You can select if you want to consider equals host the url that starts with "www" or not.
HostRedirectionController() - Constructor for class org.niocchi.rc.HostRedirectionController
 

I

incomplete_count - Variable in class org.niocchi.core.Crawler
 
incRedirected() - Method in class org.niocchi.core.Query
 
init(String) - Method in class org.niocchi.gc.GenericCrawler
 
internal_error_count - Variable in class org.niocchi.core.Crawler
 
interruptCrawling() - Method in class org.niocchi.core.Crawler
Interrupts the crawling in a clean and relative imediate way.
isAllowed(Query, URL) - Method in interface org.niocchi.core.RedirectionController
 
isAllowed(Query, URL) - Method in class org.niocchi.rc.HostRedirectionController
 
isCompressed() - Method in class org.niocchi.core.Resource
Return true is the body has been received compressed.
isRemoveWWW() - Method in class org.niocchi.rc.HostRedirectionController
 
isValid() - Method in class org.niocchi.core.Resource
If the crawled content need to be checked by the URLPool for validity, subclasses can implement this method.
isValid() - Method in class org.niocchi.gc.GenericResource
Returns true if this resource is a valid HTML page.

L

logRedirections - Static variable in class org.niocchi.rc.HostRedirectionController
 

M

main(String[]) - Static method in enum org.niocchi.core.QueryStatus
 
main(String[]) - Static method in class org.niocchi.gc.GenericCrawler
 
main(String[]) - Static method in class org.niocchi.monitor.Monitor
 
MAX_REDIRECTION_DEFAULT - Static variable in class org.niocchi.rc.HostRedirectionController
 
MemoryResource - Class in org.niocchi.core
A resource that saves its content (response body) in memory.
MemoryResource() - Constructor for class org.niocchi.core.MemoryResource
Resource constructor.
MemoryResourceFactory - Class in org.niocchi.core
An implementation of the ResourceFactoryInt interface that creates instances of MemoryResource.
MemoryResourceFactory() - Constructor for class org.niocchi.core.MemoryResourceFactory
 
message() - Method in enum org.niocchi.core.QueryStatus
 
Monitor - Class in org.niocchi.monitor
 
Monitor() - Constructor for class org.niocchi.monitor.Monitor
 
Monitor(int) - Constructor for class org.niocchi.monitor.Monitor
 
Monitorable - Interface in org.niocchi.monitor
 

O

org.niocchi.core - package org.niocchi.core
 
org.niocchi.gc - package org.niocchi.gc
 
org.niocchi.monitor - package org.niocchi.monitor
 
org.niocchi.rc - package org.niocchi.rc
 
org.niocchi.urlpools - package org.niocchi.urlpools
 

P

printMonitoredState(PrintStream) - Method in class org.niocchi.core.Crawler
write some crawl statistics.
printMonitoredState(PrintStream) - Method in class org.niocchi.gc.GenericCrawler
 
printMonitoredState(PrintStream) - Method in interface org.niocchi.monitor.Monitorable
 
processed_count - Variable in class org.niocchi.core.Crawler
 
processResource(Query) - Method in class org.niocchi.core.Worker
This method will be called on each completed query, either in error or with a successfully crawled resource
processResource(Query) - Method in class org.niocchi.gc.GenericWorker
 

Q

Query - Class in org.niocchi.core
An object that encapsulate an URL to be crawled and the returned crawl status.
Query() - Constructor for class org.niocchi.core.Query
 
Query(String) - Constructor for class org.niocchi.core.Query
 
Query(URL) - Constructor for class org.niocchi.core.Query
 
Query(Query) - Constructor for class org.niocchi.core.Query
 
Query.Method - Enum in org.niocchi.core
 
QueryStatus - Enum in org.niocchi.core
 

R

read_total_time - Variable in class org.niocchi.core.Crawler
 
readFields(DataInput) - Method in class org.niocchi.core.Query
 
redirected_count - Variable in class org.niocchi.core.Crawler
 
RedirectionController - Interface in org.niocchi.core
Implement this interface and pass it to the crawler through org.niocchi.core.Crawler.setRedirectionController to control which redirection the crawler is allowed to follow.
removeWWW(String) - Static method in class org.niocchi.rc.HostRedirectionController
 
Resource - Class in org.niocchi.core
 
Resource() - Constructor for class org.niocchi.core.Resource
 
ResourceException - Exception in org.niocchi.core
 
ResourceException() - Constructor for exception org.niocchi.core.ResourceException
 
ResourceException(String) - Constructor for exception org.niocchi.core.ResourceException
 
ResourceException(String, Exception) - Constructor for exception org.niocchi.core.ResourceException
 
ResourceException(Exception) - Constructor for exception org.niocchi.core.ResourceException
 
ResourceFactoryInt - Interface in org.niocchi.core
 
run(URLPool) - Method in class org.niocchi.core.Crawler
Start the crawl.
run() - Method in class org.niocchi.core.Worker
Starts the worker.
run() - Method in class org.niocchi.monitor.Monitor
 

S

save(String) - Method in class org.niocchi.core.DiskResource
If the body is compressed (gzipped, deflated or zipped) it is uncompressed from the tmp file into the target file and the tmp file is deleted.
save(String) - Method in class org.niocchi.core.MemoryResource
Save this resource content to a file.
save(String) - Method in class org.niocchi.core.Resource
Save this resource to a file.
select_total_time - Variable in class org.niocchi.core.Crawler
 
setAllowCompression(boolean) - Method in class org.niocchi.core.Crawler
Set content compression on/off.
setCapacity(int) - Static method in class org.niocchi.core.MemoryResource
Sets the capacity (in Bytes) for all resources.
setConnectionTimeout(int) - Method in class org.niocchi.core.Crawler
Set the connection timeout.
setMaxConsecutiveTimeouts(int) - Method in class org.niocchi.urlpools.TimeoutURLPool
 
setMaxRedirections(int) - Static method in class org.niocchi.core.Query
 
setMaxRedirections(int) - Method in class org.niocchi.rc.HostRedirectionController
Sets the maximun redirection allowed.
setMethod(Query.Method) - Method in class org.niocchi.core.Query
Set the method (GET or POST).
setNegativeResolutionTTL(int) - Method in class org.niocchi.core.Crawler
 
setPostData(String) - Method in class org.niocchi.core.Query
Set the data to send through POST.
setProcessed(Query) - Method in interface org.niocchi.core.URLPool
This method is called by the crawler when the query has been processed.
setProcessed(Query) - Method in class org.niocchi.urlpools.SimpleFileURLPool
 
setProcessed(Query) - Method in class org.niocchi.urlpools.SimpleListURLPool
 
setProcessed(Query) - Method in class org.niocchi.urlpools.TimeoutURLPool
 
setReadTimeout(int) - Method in class org.niocchi.core.Crawler
Set the read (data reception) timeout.
setRedirectionController(RedirectionController) - Method in class org.niocchi.core.Crawler
Sets the new RedirectionController.
setRemoveWWW(boolean) - Method in class org.niocchi.rc.HostRedirectionController
By default, 'www' are removed from host names before comparison.
setSelectTimeout(int) - Method in class org.niocchi.core.Crawler
Set the timeout for the selection of ready channels.
setStatus(QueryStatus) - Method in class org.niocchi.core.Query
Sets the status of this resource.
setTimeout(int) - Method in class org.niocchi.core.Crawler
Set the connection timeout and the read (data reception) timeout.
setUserAgent(String) - Method in class org.niocchi.core.Crawler
Set the user agent.
setVerbose() - Method in class org.niocchi.core.Crawler
 
SimpleFileURLPool - Class in org.niocchi.urlpools
A simple URL pool that read URL from a file, one URL per line.
SimpleFileURLPool(String) - Constructor for class org.niocchi.urlpools.SimpleFileURLPool
 
SimpleListURLPool - Class in org.niocchi.urlpools
A simple URL Pool that iterates over a list of URLS.
SimpleListURLPool(Iterator<String>) - Constructor for class org.niocchi.urlpools.SimpleListURLPool
 
SimpleListURLPool(Iterable<String>) - Constructor for class org.niocchi.urlpools.SimpleListURLPool
 
start_time - Variable in class org.niocchi.core.Crawler
 
status() - Method in enum org.niocchi.core.QueryStatus
 
status_200 - Variable in class org.niocchi.core.Crawler
 
status_other - Variable in class org.niocchi.core.Crawler
 

T

timeout_count - Variable in class org.niocchi.core.Crawler
 
TimeoutURLPool - Class in org.niocchi.urlpools
this class is an URLPool wrapper that drops all subsequent Queries from hosts that have reached too many consecutive timeouts.
TimeoutURLPool(URLPool) - Constructor for class org.niocchi.urlpools.TimeoutURLPool
 

U

UNKNOWN_HTTP_STATUS - Static variable in class org.niocchi.core.Resource
 
URLPool - Interface in org.niocchi.core
 
URLPoolException - Exception in org.niocchi.core
 
URLPoolException() - Constructor for exception org.niocchi.core.URLPoolException
 
URLPoolException(String) - Constructor for exception org.niocchi.core.URLPoolException
 
URLPoolException(String, Exception) - Constructor for exception org.niocchi.core.URLPoolException
 
URLPoolException(Exception) - Constructor for exception org.niocchi.core.URLPoolException
 

V

valueOf(String) - Static method in enum org.niocchi.core.Query.Method
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum org.niocchi.core.QueryStatus
Returns the enum constant of this type with the specified name.
values() - Static method in enum org.niocchi.core.Query.Method
Returns an array containing the constants of this enum type, in the order they are declared.
values() - Static method in enum org.niocchi.core.QueryStatus
Returns an array containing the constants of this enum type, in the order they are declared.

W

Worker - Class in org.niocchi.core
Subclass this and implement processResource() to do the work.
Worker(Crawler) - Constructor for class org.niocchi.core.Worker
 
write(DataOutput) - Method in class org.niocchi.core.Query
 
write_total_time - Variable in class org.niocchi.core.Crawler
 

A B C D E F G H I L M O P Q R S T U V W