|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectde.l3s.boilerpipe.extractors.ExtractorBase
public abstract class ExtractorBase
The base class of Extractors. Also provides some helper methods to quickly retrieve the text that remained after processing.
| Constructor Summary | |
|---|---|
ExtractorBase()
|
|
| Method Summary | |
|---|---|
java.lang.String |
getText(org.xml.sax.InputSource is)
Extracts text from the HTML code available from the given InputSource. |
java.lang.String |
getText(java.io.Reader r)
Extracts text from the HTML code available from the given Reader. |
java.lang.String |
getText(java.lang.String html)
Extracts text from the HTML code given as a String. |
java.lang.String |
getText(TextDocument doc)
Extracts text from the given TextDocument object. |
java.lang.String |
getText(java.net.URL url)
Extracts text from the HTML code available from the given URL. |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Methods inherited from interface de.l3s.boilerpipe.BoilerpipeFilter |
|---|
process |
| Constructor Detail |
|---|
public ExtractorBase()
| Method Detail |
|---|
public java.lang.String getText(java.lang.String html)
throws BoilerpipeProcessingException
getText in interface BoilerpipeExtractorhtml - The HTML code as a String.
BoilerpipeProcessingException
public java.lang.String getText(org.xml.sax.InputSource is)
throws BoilerpipeProcessingException
InputSource.
getText in interface BoilerpipeExtractoris - The InputSource containing the HTML
BoilerpipeProcessingException
public java.lang.String getText(java.net.URL url)
throws BoilerpipeProcessingException
URL.
NOTE: This method is mainly to be used for show case purposes. If you are
going to crawl the Web, consider using getText(InputSource)
instead.
url - The URL pointing to the HTML code.
BoilerpipeProcessingException
public java.lang.String getText(java.io.Reader r)
throws BoilerpipeProcessingException
Reader.
getText in interface BoilerpipeExtractorr - The Reader containing the HTML
BoilerpipeProcessingException
public java.lang.String getText(TextDocument doc)
throws BoilerpipeProcessingException
TextDocument object.
getText in interface BoilerpipeExtractordoc - The TextDocument.
BoilerpipeProcessingException
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||