public class Crawler
extends java.lang.Object
Simple HTML crawler for Lucene
http://lucene.apache.org/java/2_3_0/api/core/index.html
Constructor and Description |
---|
Crawler() |
public void crawl(java.lang.String sBasePath, java.lang.String sFileFilter, java.lang.String sIndexDirectory, boolean bRebuild) throws java.io.IOException, org.apache.oro.text.regex.MalformedPatternException
Add contents to a Lucene Index
sBasePath
- Base Path for crawlingsFileFilter
- Perl5 Regular Expression filter for file namessIndexDirectory
- Lucene index target directorybRebuild
- true if index must be deleted and fully rebuild.java.io.IOException
java.io.FileNotFoundException
- If sBasePath direcory does not existorg.apache.oro.text.regex.MalformedPatternException
- If sFileFilter is not a valid Perl5 regular expression patternpublic static void main(java.lang.String[] argv) throws java.lang.NoSuchFieldException, java.io.IOException, java.io.FileNotFoundException, org.apache.oro.text.regex.MalformedPatternException
java.lang.NoSuchFieldException
java.io.IOException
java.io.FileNotFoundException
org.apache.oro.text.regex.MalformedPatternException