atextcrawler/doc/source/devel/related_work.md
2021-11-29 09:16:31 +00:00

3.2 KiB

crawlers

general

sitemap parsers

url handling

language detection

text extraction

deduplication

Extract more meta tags

Date parsing dependent on language

ICU