2.2 KiB
2.2 KiB
TODO
-
parse html time tags
-
site annotations:
- categories
- historical (no changes any more since n months)
- news
- local focus - geonames: http://download.geonames.org/export/dump/cities15000.zip
- categories
-
allow for tls in elasticsearch config
-
replace dashes, dots and quotes:
3dd9598139/src/calibre/utils/unsmarten.py
'–': '--',
'–': '--',
'–': '--',
'—': '---',
'—': '---',
'—': '---',
'…': '...',
'…': '...',
'…': '...',
'“': '"',
'”': '"',
'„': '"',
'″': '"',
'“': '"',
'”': '"',
'„': '"',
'″': '"',
'“':'"',
'”':'"',
'„':'"',
'″':'"',
'‘':"'",
'’':"'",
'′':"'",
'‘':"'",
'’':"'",
'′':"'",
'‘':"'",
'’':"'",
'′':"'",
-
normalize quotation marks and punctuation in general
-
cancel crawls that take too long
-
search for "TODO" in code
-
feedparser has support for JSON feeds since commit a5939702b1fd0ec75d2b586255ff0e29e5a8a6fc (as of 2020-10-26 in "develop" branch, not part of a release) the version names are 'json1' and 'json11'
-
allow site URLs with path, e.g. https://web.archive.org/web/20090320055457/http://www.geocities.com/kk_abacus/
-
add more languages
Ideas
-
use python-libzim to create ZIM archives