Commit graph

54 commits

Author SHA1 Message Date
Robert Sparks b1585124d6 Improve robustness of pdfization. Tune the test crawler. Commit ready for merge.
- Legacy-Id: 19813
2022-01-06 20:17:55 +00:00
Robert Sparks 3697180cc1 Reverted merge of timezone-aware migration efforts.
- Legacy-Id: 18792
2021-01-12 16:54:20 +00:00
Henrik Levkowetz 774e752a54 Snapshot of timezone-aware datatracker code. Tests pass, and the test-crawler shows only expected differences. Trunk changes merged in up to r18768.
- Legacy-Id: 18770
2020-12-16 23:53:37 +00:00
Henrik Levkowetz 7ee6bd4fb4 When doing test-crawling, ignore variations of the 'next=' query arg. (The code ignores other query args if 'next' is given).
- Legacy-Id: 18730
2020-12-04 16:04:01 +00:00
Henrik Levkowetz 5d03afa6aa Reduced the number of htmlization URLs visited further.
- Legacy-Id: 17999
2020-06-16 20:07:11 +00:00
Henrik Levkowetz 221e989754 Fixed a bad regex in test-crawl
- Legacy-Id: 17970
2020-06-11 09:22:26 +00:00
Henrik Levkowetz 516f41e5d7 Excluded a majority of htmlized drafts at /doc/html (but keeping some for testing) in order to reduce the crawl time.
- Legacy-Id: 17918
2020-06-06 20:58:04 +00:00
Henrik Levkowetz 690fb3a370 Added a bunch of drafts for which we don't have text files to the test-crawler exclusion list.
- Legacy-Id: 17805
2020-05-16 13:50:45 +00:00
Henrik Levkowetz 695b6e0e86 Tweaked test-crawl to not visit all 180.000 /html/ pages.
- Legacy-Id: 17763
2020-05-08 18:49:33 +00:00
Henrik Levkowetz 25af6fbfad Updated the test crawler for python3.
- Legacy-Id: 16438
2019-07-08 19:37:10 +00:00
Henrik Levkowetz 8d1d0cda97 Added a no-follow option to the test crawler, in order to be able to easily test a specific list of URLs.
- Legacy-Id: 16188
2019-05-06 13:35:29 +00:00
Henrik Levkowetz b48caef487 Tweaked the test-crawler to not follow redirects to www.ietf.org. Asking the test client for non-datatracker URLs doesn't give back anything meaningful ,:-)
- Legacy-Id: 14930
2018-03-26 13:01:38 +00:00
Henrik Levkowetz ef99946ca9 Fixed a bug in the handling of checks failures.
- Legacy-Id: 14477
2017-12-30 18:46:13 +00:00
Henrik Levkowetz 5bcecc7c54 Fixed a bug and added an url exception for some redirected urls in the test crawler.
- Legacy-Id: 13992
2017-07-28 12:50:39 +00:00
Henrik Levkowetz eb610d2d94 Increased the test crawlers verbose output.
- Legacy-Id: 13685
2017-06-19 23:31:53 +00:00
Lars Eggert 76a3c8bdc0 Update vnu.jar and fix various HTML5 nits it found during a test crawl.
Commit ready for merge.
 - Legacy-Id: 13118
2017-03-25 20:21:14 +00:00
Henrik Levkowetz 5bb9518b5f Added some new exceptions to the test-crawler; files which are known to not exist, and files with known html character problems.
- Legacy-Id: 13037
2017-03-20 13:46:23 +00:00
Henrik Levkowetz 7296b951ee Refined the test crawler a bit, to avoid extracting URLs to follow
from html outside the datatracker's control, such as uploaded WG
agendas.  Also excempted some pages with known-bad character issues
from html validation, and refined the error reporting for html
validation failures.
 - Legacy-Id: 13027
2017-03-19 19:34:50 +00:00
Henrik Levkowetz a78c419845 Removed a debug print statement.
- Legacy-Id: 12870
2017-02-17 17:53:26 +00:00
Henrik Levkowetz c344a18bdf Fixed an issue with the test-crawler which could cause false positives for urls containing apostrophe.
- Legacy-Id: 12851
2017-02-16 09:58:34 +00:00
Henrik Levkowetz 9a3f6b059b Merged Django-1.8 upgrade work to trunk. Adjusted migration names, and added migrations as necessary. Fixed some instances of broken html.
- Legacy-Id: 12507
2016-12-13 05:55:46 +00:00
Henrik Levkowetz 44269f1d73 Added an URL to skip to the test-crawler
- Legacy-Id: 12500
2016-12-09 13:04:22 +00:00
Henrik Levkowetz fde59c1e1e Removed debugging code.
- Legacy-Id: 12420
2016-11-29 22:20:30 +00:00
Henrik Levkowetz bb9741193c Added an url to skip (from an uploaded html agenda).
- Legacy-Id: 12400
2016-11-28 13:38:31 +00:00
Henrik Levkowetz 8e11c7cb64 Fixed some invalid html, and tweaked the html validation settings in the test crawler.
- Legacy-Id: 12066
2016-09-30 18:47:56 +00:00
Henrik Levkowetz 4b0a9360f0 Merged in ^/branch/iola/event-saving-refactor-r10291, which refactors document saving to always use doc.save_with_history(events), and requires accompanying events. This branch also provides refactoring of recurring regexes in url patterns into a dictionary. As part of the merge, also refactored new code which didn't use the save_with_history() method.
- Legacy-Id: 11840
2016-08-23 10:52:08 +00:00
Henrik Levkowetz 3d48650c0d Another test-crawler tweak.
- Legacy-Id: 11433
2016-06-20 22:47:04 +00:00
Henrik Levkowetz de0753fa76 Tweaked the test crawler a bit to skip some slow and meaningless checks.
- Legacy-Id: 11431
2016-06-20 22:03:06 +00:00
Henrik Levkowetz aee36651a5 Tweaked the test-crawler to give the same log line format for exception failures as for regular log lines.
- Legacy-Id: 10936
2016-03-16 13:21:02 +00:00
Ole Laursen 86c3a430d1 Merge in ^/branch/iola/event-saving-refactor-r10076, fixing a few problems
- Legacy-Id: 10298
2015-10-27 10:37:06 +00:00
Henrik Levkowetz f41553f3d1 Added 2 new file existence checks to the check framework, since we're now reading email aliases for groups and documents from files. Added a call out to run_checks() in the test-crawler, so we don't see failures due to missing files.
- Legacy-Id: 10204
2015-10-13 19:07:11 +00:00
Henrik Levkowetz cfefc0ae58 Changed the default settings for the test crawler from ietf.settings to ietf.settings_testcrawl.
- Legacy-Id: 10120
2015-10-01 20:54:46 +00:00
Ole Laursen 5e4645d7d2 Summary: Trim the test-crawl imports
- Legacy-Id: 10107
2015-09-29 13:21:24 +00:00
Henrik Levkowetz 11411d2c30 Merged in an update from trunk@9942.
- Legacy-Id: 9961
2015-08-03 14:12:38 +00:00
Henrik Levkowetz f48452853f Changed test-crawl to avoid unnecessary repetitions of the blacklisting message.
- Legacy-Id: 9933
2015-08-01 12:47:03 +00:00
Henrik Levkowetz 948804f73f Added static javascript and image files to the URLs crawled by the test-crawler.
- Legacy-Id: 9913
2015-07-29 17:03:32 +00:00
Henrik Levkowetz 224fef557c Added a --random switch to choose between different test-crawler modes.
- Legacy-Id: 9893
2015-07-27 16:52:26 +00:00
Henrik Levkowetz 8612ce92c0 Merged in [9765] from lars@netapp.com:
Add option to crawl as a logged-in user (--user).
Add --pedantic option for vnu crawl, which stops the crawl on (most) errors.
Randomize the order in which URLs are crawled, so that repeated crawls don't
hit the same URLs in the same order.
 - Legacy-Id: 9785
Note: SVN reference [9765] has been migrated to Git commit 9b4e61049a704127e1200549fcc410326efffddb
2015-07-18 12:00:37 +00:00
Henrik Levkowetz ed66e24e7c Merged in [9726] from lars@netapp.com:
Add HTML5 validation based on validator.nu to test-crawl.
 - Legacy-Id: 9763
Note: SVN reference [9726] has been migrated to Git commit 5826bcbf80
2015-07-18 08:20:35 +00:00
Lars Eggert 5826bcbf80 Add HTML5 validation based on validator.nu to test-crawl. Commit ready for merge.
- Legacy-Id: 9726
2015-07-15 12:41:09 +00:00
Henrik Levkowetz 926b5831d6 Tweaked the test-crawl summary.
- Legacy-Id: 9574
2015-04-27 08:33:36 +00:00
Henrik Levkowetz 60738dc8bd Don't use non-zero exit code for test-crawler runs with nonvalidating html warnings.
- Legacy-Id: 9559
2015-04-25 06:36:22 +00:00
Henrik Levkowetz eadf421fc1 Added a new url folding operation for the html verification.
- Legacy-Id: 9557
2015-04-24 22:11:34 +00:00
Henrik Levkowetz e32af567ef Added html validation to the test crawler; it will now report html which fails validation with 'WARN' indications. Reorganized the code somewhat, collecting functions, globals, etc. in groups.
- Legacy-Id: 9549
2015-04-24 20:30:46 +00:00
Henrik Levkowetz 7c67e26fa4 Added a --logfile switch to the test crawler, in order to be able to control whether a logfile should be used or not. It's not particularly hepful when running on a buildbot slave, which catches stdout anyway.
- Legacy-Id: 9252
2015-03-19 20:28:25 +00:00
Henrik Levkowetz 86997e1e95 Turned the api.py file into a module. Moved the makeresources management command to the api module. Added some api tests. Added crawling of api files to the test-crawler. Adjusted some resource files discovered by the test suite and test-crawler. Removed a bunch of empty model files.
- Legacy-Id: 9144
2015-03-03 20:23:36 +00:00
Henrik Levkowetz 7ecfac6308 Merged in personal/henrik/django-1.7@9020 which upgrades Django from 1.6.0 to 1.7.4 and applies the needed changes to the datatracker code to work with release 1.7.x.
- Legacy-Id: 9028
2015-02-08 21:16:44 +00:00
Henrik Levkowetz 028b7e315a Reverted to [9025] because commit [9026] failed (it was incomplete with a broken working dir).
- Legacy-Id: 9027
Note: SVN reference [9026] has been migrated to Git commit 4a3749a66b
2015-02-08 20:03:16 +00:00
Henrik Levkowetz 4a3749a66b Merged in personal/henrik/django-1.7@9020 which upgrades Django from 1.6.0 to 1.7.4 and applies the needed changes to the datatracker code to work with release 1.7.x.
- Legacy-Id: 9026
2015-02-08 19:16:46 +00:00
Henrik Levkowetz 1210f77604 With django 1.7, standalone scripts need to call django.setup() before doing any operations involving models. Modified all scripts in bin/ and ietf/bin/ which seemed to need it.
- Legacy-Id: 9017
2015-02-07 21:13:38 +00:00