* feat: move differencing test-crawl forward from tzaware-obe.
* fix: port html validation changes from main into test-crawl
* fix: address review comments
* Unicode messages are triggered by both db content and tests
* Make ids unique
* Avoid "No value found" message on page
* Strip HTML from history entries, it's often broken
* Check HTML sources for occurrences of "** No value found for" and fix them
* Fix another occurrence of "** No value found for"
* Fix more occurrences of "** No value found for"
* Fix document revision stripping
* Force breaks of long (garbage) words
* Check URL validity before urlizing them
* Handle some additional corner cases
* Linkify action items
* Don't create profile/email links for System
* Handle headings with HTML elements in them better
* Fix comment
* Fix another occurrence of "** No value found for"
* Better I-D URLization that handles more edge cases. Also, test for them.
* Remove print
* Handle charters better
* Cache for one day
* Update vnu.jar
* Fix py2 -> py3 issue
* Run pyupgrade
* test: Add default-jdk to images
* test: Add option to also validate HTML with vnu.jar
Since it's already installed in bin. Don't do this by default, since it
increases the time needed for tests by ~50%.
* fix: Stop the urlizer from urlizing in linkified mailto: text
* More HTML fixes
* More HTML validation fixes
* And more HTML fixes
* Fix floating badge
* Ignore unicode errors
* Only URLize docs that are existing
* Final fixes
* Don't URLize everything during test-crawl
* Feed HTML into vnu using python rather than Java to speed things up
* Allow test-crawl to start vnu on a different port
* Increase retry count to vnu. Restore batch size to 30.
* More HTML validation fixes
* Use urllib3 to make requests to vnu, since overriding requests_mock is tricky
* Undo commit of unmodified file
* Also urlize ftp links
* Fix matching of file name
* More HTML fixes
* Add `is_valid_url` filter
* weekday -> data-weekday
* urlencode URLs
* Add and use vnu_fmt_message. Bump vnu max buffer.
* Simplify doc_exists
* Don't add tab link to mail archive if the URL is invalid
* Run urlize_ietf_docs before linkify
Reduces the possibility of generating incorrect HTML
* Undo superfluous change
* Runner fixes
* Consolidate vnu message filtering into vnu_filter_message
* Correctly handle multiple persons with same name
* Minimze diff
* Fix HTML nits
* Print source snippet in vnu_fmt_message
* Only escape if there is something to escape
* Fix snippet
* Skip crufty old IPR declarations
* Only include modal when needed. Add handles.
* Fix wordwrap+linkification
* Update ietf/doc/templatetags/ietf_filters.py
* Update ietf/doc/templatetags/tests_ietf_filters.py
* Don't right-align second column
from html outside the datatracker's control, such as uploaded WG
agendas. Also excempted some pages with known-bad character issues
from html validation, and refined the error reporting for html
validation failures.
- Legacy-Id: 13027
Add option to crawl as a logged-in user (--user).
Add --pedantic option for vnu crawl, which stops the crawl on (most) errors.
Randomize the order in which URLs are crawled, so that repeated crawls don't
hit the same URLs in the same order.
- Legacy-Id: 9785
Note: SVN reference [9765] has been migrated to Git commit 9b4e61049a704127e1200549fcc410326efffddb