|
||||||||||
| PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES | |||||||||
See:
Description
| Class Summary | |
|---|---|
| BoilerplateBlockFilter | Removes TextBlocks which have explicitly been marked as "not content". |
| InvertedFilter | Reverts the "isContent" flag for all TextBlocks |
| LabelToBoilerplateFilter | Marks all blocks that contain a given label as "boilerplate". |
| LabelToContentFilter | Marks all blocks that contain a given label as "content". |
| MarkEverythingContentFilter | Marks all blocks as content. |
| MinClauseWordsFilter | Keeps only blocks that have at least one segment fragment ("clause") with at least k words (default: 5). |
| MinWordsFilter | Keeps only those content blocks which contain at least k words. |
| SplitParagraphBlocksFilter | Splits TextBlocks at paragraph boundaries. |
The BoilerpipeFilters in this package are straight-forward and probably not really specific to English.
|
||||||||||
| PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES | |||||||||