Commit graph

53 commits

Author SHA1 Message Date
Jennifer Richards e91bda7e5e
feat: consolidate HTML sanitizing (#8471)
* refactor: isolate bleach code

* refactor: move html fns to html.py

* refactor: lose the bleach.py module; refactor

* refactor: sanitize_document -> clean_html

Drops <meta charset="utf-8"> addition after cleaning.

* fix: disambiguate import

* feat: restore <meta charset="utf-8"> tag

* chore: comments

* chore(deps): drop lxml_html_clean package

* refactor: on second thought, no meta charset

* refactor: sanitize_fragment -> clean_html

* test: remove check for charset

* chore: fix lint
2025-01-28 11:28:19 -06:00
Robert Sparks 8cb7f3dcae
feat: Import IESG artifacts into the datatracker (#6908)
* chore: remove unused setting

* feat: initial import of iesg minutes

* fix: let the meetings view show older iesg meetings

* feat: iesg narrative minutes

* feat: import bof coordination call minutes

* wip: import commands for iesg appeals and statements

* feat: import iesg statements.

* feat: import iesg artifacts

* feat: many fewer n+1 queries for the group meetings view

* fix: restore chain of elifs in views_doc

* fix: use self.stdout.write vs print in mgmt commands

* fix: use replace instead of astimezone when appropriate

* chore: refactor new migrations into one

* fix: transcode some old files into utf8

* fix: repair overzealous replace

* chore: black

* fix: address minro review comments

* fix: actually capture transcoding work

* fix: handle multiple iesg statements on the same day

* fix: better titles

* feat: pill badge replaced statements

* fix: consolodate source repos to one

* feat: liberal markdown for secretariat controlled content

* fix: handle (and clean) html narrative minutes

* feat: scrub harder

* fix: simplify and improve a scrubber

* chore: reorder migrations
2024-02-20 16:35:08 -06:00
Jennifer Richards ec7c7b3701
chore: Upgrade to bleach v6 (#5021)
* build: Bump bleach requirement to 6.0.0

* fix: Update bleach configuration for compatibility with v6 changes
2023-01-23 13:29:45 -06:00
Lars Eggert a9e3b926f7
fix: rfc2html creates a "a" tag without href for references, handle that (#4975) 2023-01-17 10:20:13 -06:00
Lars Eggert 6eabd4a3a1
chore: Use codespell to fix typos in comments. (#4794)
First part of replacement of #4651
2022-11-28 10:36:36 -06:00
Jennifer Richards 3705bedfcd
feat: Celery support and asynchronous draft submission API (#4037)
* ci: add Dockerfile and action to build celery worker image

* ci: build celery worker on push to jennifer/celery branch

* ci: also build celery worker for main branch

* ci: Add comment to celery Dockerfile

* chore: first stab at a celery/rabbitmq docker-compose

* feat: add celery configuration and test task / endpoint

* chore: run mq/celery containers for dev work

* chore: point to ghcr.io image for celery worker

* refactor: move XML parsing duties into XMLDraft

Move some PlaintextDraft methods into the Draft base class and
implement for the XMLDraft class. Use xml2rfc code from ietf.submit
as a model for the parsing.

This leaves some mismatch between the PlaintextDraft and the Draft
class spec for the get_author_list() method to be resolved.

* feat: add api_upload endpoint and beginnings of async processing

This adds an api_upload() that behaves analogously to the api_submit()
endpoint. Celery tasks to handle asynchronous processing are added but
are not yet functional enough to be useful.

* perf: index Submission table on submission_date

This substantially speeds up submission rate threshold checks.

* feat: remove existing files when accepting a new submission

After checking that a submission is not in progress, remove any files
in staging that have the same name/rev with any extension. This should
guard against stale files confusing the submission process if the
usual cleanup fails or is skipped for some reason.

* refactor: make clear that deduce_group() uses only the draft name

* refactor: extract only draft name/revision in clean() method

Minimizing the amount of validation done when accepting a file. The
data extraction will be moved to asynchronous processing.

* refactor: minimize checks and data extraction in api_upload() view

* ci: fix dockerfiles to match sandbox testing

* ci: tweak celery container docker-compose settings

* refactor: clean up Draft parsing API and usage

  * remove get_draftname() from Draft api; set filename during init
  * further XMLDraft work
    - remember xml_version after parsing
    - extract filename/revision during init
    - comment out long broken get_abstract() method
  * adjust form clean() method to use changed API

* feat: flesh out async submission processing

First basically working pass!

* feat: add state name for submission being validated asynchronously

* feat: cancel submissions that async processing can't handle

* refactor: simplify/consolidate async tasks and improve error handling

* feat: add api_submission_status endpoint

* refactor: return JSON from submission api endpoints

* refactor: reuse cancel_submission method

* refactor: clean up error reporting a bit

* feat: guard against cancellation of a submission while validating

Not bulletproof but should prevent

* feat: indicate that a submission is still being validated

* fix: do not delete submission files after creating them

* chore: remove debug statement

* test: add tests of the api_upload and api_submission_status endpoints

* test: add tests and stubs for async side of submission handling

* fix: gracefully handle (ignore) invalid IDs in async submit task

* test: test process_uploaded_submission method

* fix: fix failures of new tests

* refactor: fix type checker complaints

* test: test submission_status view of submission in "validating" state

* fix: fix up migrations

* fix: use the streamlined SubmissionBaseUploadForm for api_upload

* feat: show submission history event timestamp as mouse-over text

* fix: remove 'manual' as next state for 'validating' submission state

* refactor: share SubmissionBaseUploadForm code with Deprecated version

* fix: validate text submission title, update a couple comments

* chore: disable requirements updating when celery dev container starts

* feat: log traceback on unexpected error during submission processing

* feat: allow secretariat to cancel "validating" submission

* feat: indicate time since submission on the status page

* perf: check submission rate thresholds earlier when possible

No sense parsing details of a draft that is going to be dropped regardless
of those details!

* fix: create Submission before saving to reduce race condition window

* fix: call deduce_group() with filename

* refactor: remove code lint

* refactor: change the api_upload URL to api/submission

* docs: update submission API documentation

* test: add tests of api_submission's text draft consistency checks

* refactor: rename api_upload to api_submission to agree with new URL

* test: test API documentation and submission thresholds

* fix: fix a couple api_submission view renames missed in templates

* chore: use base image + add arm64 support

* ci: try to fix workflow_dispatch for celery worker

* ci: another attempt to fix workflow_dispatch

* ci: build celery image for submit-async branch

* ci: fix typo

* ci: publish celery worker to ghcr.io/painless-security

* ci: install python requirements in celery image

* ci: fix up requirements install on celery image

* chore: remove XML_LIBRARY references that crept back in

* feat: accept 'replaces' field in api_submission

* docs: update api_submission documentation

* fix: remove unused import

* test: test "replaces" validation for submission API

* test: test that "replaces" is set by api_submission

* feat: trap TERM to gracefully stop celery container

* chore: tweak celery/mq settings

* docs: update installation instructions

* ci: adjust paths that trigger celery worker image  build

* ci: fix branches/repo names left over from dev

* ci: run manage.py check when initializing celery container

Driver here is applying the patches. Starting the celery workers
also invokes the check task, but this should cause a clearer failure
if something fails.

* docs: revise INSTALL instructions

* ci: pass filename to pip update in celery container

* docs: update INSTALL to include freezing pip versions

Will be used to coordinate package versions with the celery
container in production.

* docs: add explanation of frozen-requirements.txt

* ci: build image for sandbox deployment

* ci: add additional build trigger path

* docs: tweak INSTALL

* fix: change INSTALL process to stop datatracker before running migrations

* chore: use ietf.settings for manage.py check in celery container

* chore: set uid/gid for celery worker

* chore: create user/group in celery container if needed

* chore: tweak docker compose/init so celery container works in dev

* ci: build mq docker image

* fix: move rabbitmq.pid to writeable location

* fix: clear password when CELERY_PASSWORD is empty

Setting to an empty password is really not a good plan!

* chore: add shutdown debugging option to celery image

* chore: add django-celery-beat package

* chore: run "celery beat" in datatracker-celery image

* chore: fix docker image name

* feat: add task to cancel stale submissions

* test: test the cancel_stale_submissions task

* chore: make f-string with no interpolation a plain string

Co-authored-by: Nicolas Giard <github@ngpixel.com>
Co-authored-by: Robert Sparks <rjsparks@nostrum.com>
2022-08-22 13:29:31 -05:00
Lars Eggert 1ba87890ba
feat: Render the document shepherd writeup templates at two new URLs (#4225)
* feat: Render the document shepherd writeup templates at two new URL.

Those being `/doc/shepherdwriteuptemplate/group` and
`/doc/shepherdwriteuptemplate/individual`.

* Address review comments from @jennifer-richards

* Fixes

* Remove debug statement

* Make bleach sanitizer not strip the `start` attribute of `ol` tags

Also rearrange the code a bit

* Don't sanitize the `python_markdown` output, it destroys wanted formatting

* Restore bleach

* Don't bleach tag `id`s.
2022-07-22 13:43:02 -05:00
Lars Eggert fd087d4e16
fix: Avoid crashes in urlize_ietf_docs (#4161)
* fix: Don't crash when urlreverse fails as part of urlize_ietf_docs

Also fix an HTMLization nit.

* Fix more corner cases found during test-crawl

* Handle "I-D.*"" reference-style matches

* Refactor use of bleach. Better Markdown linkification and formatting.

* Address review comment from @rjsparks
2022-07-07 12:27:30 -05:00
Lars Eggert 5f8d4ed718
More HTML nitfixing (#3934)
* Unicode messages are triggered by both db content and tests

* Make ids unique

* Avoid "No value found" message on page

* Strip HTML from history entries, it's often broken

* Check HTML sources for occurrences of "** No value found for" and fix them

* Fix another occurrence of "** No value found for"

* Fix more occurrences of "** No value found for"

* Fix document revision stripping

* Force breaks of long (garbage) words

* Check URL validity before urlizing them

* Handle some additional corner cases

* Linkify action items

* Don't create profile/email links for System

* Handle headings with HTML elements in them better

* Fix comment

* Fix another occurrence of "** No value found for"

* Better I-D URLization that handles more edge cases. Also, test for them.

* Remove print

* Handle charters better

* Cache for one day
2022-05-10 12:37:14 -05:00
Lars Eggert 5598762608
fix: add more HTML validation & fixes (#3891)
* Update vnu.jar

* Fix py2 -> py3 issue

* Run pyupgrade

* test: Add default-jdk to images

* test: Add option to also validate HTML with vnu.jar

Since it's already installed in bin. Don't do this by default, since it
increases the time needed for tests by ~50%.

* fix: Stop the urlizer from urlizing in linkified mailto: text

* More HTML fixes

* More HTML validation fixes

* And more HTML fixes

* Fix floating badge

* Ignore unicode errors

* Only URLize docs that are existing

* Final fixes

* Don't URLize everything during test-crawl

* Feed HTML into vnu using python rather than Java to speed things up

* Allow test-crawl to start vnu on a different port

* Increase retry count to vnu. Restore batch size to 30.

* More HTML validation fixes

* Use urllib3 to make requests to vnu, since overriding requests_mock is tricky

* Undo commit of unmodified file

* Also urlize ftp links

* Fix matching of file name

* More HTML fixes

* Add `is_valid_url` filter

* weekday -> data-weekday

* urlencode URLs

* Add and use vnu_fmt_message. Bump vnu max buffer.

* Simplify doc_exists

* Don't add tab link to mail archive if the URL is invalid

* Run urlize_ietf_docs before linkify

Reduces the possibility of generating incorrect HTML

* Undo superfluous change

* Runner fixes

* Consolidate vnu message filtering into vnu_filter_message

* Correctly handle multiple persons with same name

* Minimze diff

* Fix HTML nits

* Print source snippet in vnu_fmt_message

* Only escape if there is something to escape

* Fix snippet

* Skip crufty old IPR declarations

* Only include modal when needed. Add handles.

* Fix wordwrap+linkification

* Update ietf/doc/templatetags/ietf_filters.py

* Update ietf/doc/templatetags/tests_ietf_filters.py

* Don't right-align second column
2022-05-03 13:55:48 -05:00
Lars Eggert 9db1d48258
fix: Correctly linkify all current TLDs (#3868)
* fix: Correctly linkify all current TLDs

* Pass a list to the build_*_re functions, not a string

* Need to sort TLDs by length to force longer ones to match first

* chore: silence incorrect mypy complaint.

Co-authored-by: Robert Sparks <rjsparks@nostrum.com>
Co-authored-by: Nicolas Giard <github@ngpixel.com>
2022-04-26 12:25:18 -05:00
Lars Eggert cf629a42ad And more fixes.
- Legacy-Id: 19877
2022-01-25 10:14:25 +00:00
Kesara Rathnayake 0a645fd486 Parse RFC2047 formatted text properly in submission form. Fixes #2465. Commit ready for merge.
- Legacy-Id: 19120
2021-06-14 10:46:35 +00:00
Henrik Levkowetz 726fcbf27d Removed all __future__ imports.
- Legacy-Id: 17391
2020-03-05 23:53:42 +00:00
Henrik Levkowetz 1c808bf63b Removed further six usage.
- Legacy-Id: 17387
2020-03-05 15:54:32 +00:00
Henrik Levkowetz e9a37d8ac8 Removed six.text_type(), changed six.moves.urllib to plain urllib, and removed now unused six imports.
- Legacy-Id: 17385
2020-03-05 14:41:41 +00:00
Henrik Levkowetz fcb6806d17 Merged in work from sasha@dashcare.nl on Review Queue Managemnt:
This abstracts queue management, making it possible to implement different
policies for each team. It provides two concrete policies:
RotateAlphabeticallyReviewerQueuePolicy, which rotates an alphabetically
ordered reviewer list with consideration for skip indications, and is the
default policy; and LeastRecentlyUsedReviewerQueuePolicy, a simple
least-recently-used policy.  Also see issues #2721 and #2656.
 - Legacy-Id: 17121
2019-12-04 23:02:52 +00:00
Henrik Levkowetz ac6b664fa5 Added normalization of draft title extracted from submitted XML.
- Legacy-Id: 17119
2019-12-02 16:24:51 +00:00
Henrik Levkowetz c233f07b5d Added a management command to generate draft bibxml files, and also a trial version of datatracker draft bibxml pages.
- Legacy-Id: 16962
2019-11-05 18:10:29 +00:00
Henrik Levkowetz 24ede9a1ae In wordwrap(), consider lines consisting entirely of some non-alphanumeric characters like ---- or === to be block (paragraph) separators. Fixes issue #2806.
- Legacy-Id: 16790
2019-10-01 11:08:41 +00:00
Henrik Levkowetz 8c6eb3a30a Python2/3 compatibility: Changed the use of open() and StringIO to io.open() etc.
- Legacy-Id: 16458
2019-07-15 19:14:04 +00:00
Henrik Levkowetz f481f5c3e6 Replaced use of six with the equivalent pure python3 constructs.
- Legacy-Id: 16428
2019-07-08 10:43:47 +00:00
Henrik Levkowetz 0679eaa8d4 Removed unused imports.
- Legacy-Id: 16402
2019-07-04 21:06:57 +00:00
Henrik Levkowetz f480799af9 Undid unintentional bulk commit
- Legacy-Id: 16401
2019-07-04 21:04:46 +00:00
Henrik Levkowetz fc09a59950 Added decode() of command pipe output.
- Legacy-Id: 16400
2019-07-04 21:01:39 +00:00
Henrik Levkowetz 318bd0d5ea Changed regex strings to r strings.
- Legacy-Id: 16320
2019-06-28 13:32:50 +00:00
Henrik Levkowetz e39ac52071 Removed 2to3-generated list() around .items() iterator in for loops.
- Legacy-Id: 16315
2019-06-27 18:11:17 +00:00
Henrik Levkowetz d7f5c84182 Initial 2to3 patch with added copyright statement updates.
- Legacy-Id: 16309
2019-06-27 14:40:54 +00:00
Henrik Levkowetz 849a3dcc97 Added another exception class to a catch instance in a function, triggered by a new usage case.
- Legacy-Id: 15990
2019-03-04 20:12:30 +00:00
Henrik Levkowetz e53318084d Added a tiny utility function unwrap() to unwrap wrapped text for matching expected strings in tests.
- Legacy-Id: 15396
2018-07-19 15:53:05 +00:00
Henrik Levkowetz 0800304b67 Added TeX escaping utility functions and template filters. Removed
html escaping and added TeX escaping for relevant parts of the bibtext
template.  Fixes issue #2459.
 - Legacy-Id: 14711
2018-02-27 18:15:21 +00:00
Henrik Levkowetz 9ffe1e425a Reverted unintentional commit
- Legacy-Id: 14709
2018-02-27 17:58:25 +00:00
Henrik Levkowetz a5db4d00de Updated PLAN
- Legacy-Id: 14708
2018-02-27 17:55:43 +00:00
Henrik Levkowetz 1ed8e967e7 Merged in ^/personal/henrik/6.72.1-django-1.11@14676: Upgrade to Django 1.11
- Legacy-Id: 14695
2018-02-25 19:55:16 +00:00
Henrik Levkowetz 71a9ffafc5 Changed allow_lazy to the @keep_lazy decorator.
- Legacy-Id: 14674
2018-02-22 00:13:32 +00:00
Henrik Levkowetz 5638cf3da3 Changed all usage of ForeignKey and OneToOneFiled in model.py files to the compatibility versions from ietf.utils.models.
- Legacy-Id: 14661
2018-02-20 15:36:05 +00:00
Henrik Levkowetz dd7853c7a3 Check line lenght before assuming there's a first character.
- Legacy-Id: 14619
2018-02-06 15:18:11 +00:00
Henrik Levkowetz 717868cae2 Rewrote text_to_dict() and dict_to_text() to support unicode without RFC2822 encoding issues. Added initial values in IPR update forms, from the original disclosure, in order to make updates easier. Addresses issue #2413.
- Legacy-Id: 14531
2018-01-17 00:21:34 +00:00
Henrik Levkowetz 660c81c272 Tweaked the file content read refactoring in [14406] to try latin-1 conversion if unicode doesn't work.
- Legacy-Id: 14410
Note: SVN reference [14406] has been migrated to Git commit 967ece7e7d
2017-12-10 17:48:09 +00:00
Henrik Levkowetz 967ece7e7d Started refactoring of reading text from document files (drafts, charters, etc.) in order to normalise on one way of doing this, and making that return unicode rather than undecoded bytes. This is the first step of two, in order to gauge the possible issues and report on discrepancies.
- Legacy-Id: 14406
2017-12-08 21:51:11 +00:00
Henrik Levkowetz 2c1438c240 Moved unidecode_name from utils.text to person.name.
Modified UserFactory to use a new locale for each new user, instead of the
same locale for a whole test run.  This (almost) ensures the exercise of
code to deal with non-ascii names, something which would not happen if a
locale with ascii names was chosen at the start of a run.

Modified name.initials() to not use non-word characters as initials.

Modified unidecode_name() to do more normalization, to conform to the
conventions used in internet-drafts.

Added saving of the factory-boy random state in order to be able to re-run
a test suite with the same pseudo-random sequence as in a previous failed
run.

Fixed an issue with email formatting in test_api_submit_ok().

Modified the draft author extraction code to deal better with names with
embedded apostrophes.
 - Legacy-Id: 14141
2017-09-20 15:36:30 +00:00
Henrik Levkowetz 33b275b04f Added ietf.utils.text.unidecode_name() and replaced various uses of unidecode() with it, in order to normalize the generation of ascii versions of names, to avoid different practices in space stripping and space normalization in different parts of the code.
- Legacy-Id: 14128
2017-09-17 15:12:18 +00:00
Henrik Levkowetz 34a2352288 Make sure wordwrap() and friend works as intended if they are used as template filters and given string arguments.
- Legacy-Id: 13653
2017-06-16 13:15:02 +00:00
Henrik Levkowetz a34d078428 Commented out again a function that was commented in by mistake in the committed code.
- Legacy-Id: 13543
2017-06-06 16:30:53 +00:00
Henrik Levkowetz 5ca4309691 Fixed a bug in wordwrap() where an URL (or any word) longer than width could prevent line breaking in following text.
- Legacy-Id: 13541
2017-06-06 14:03:02 +00:00
Henrik Levkowetz 5b2087f910 Eliminated several variations on word wrapping, keeping only what used to be wrap_text(), but renamed as ietf.utils.text.wordwrap(). This performs better than django.utils.text.wrap() when there are indented text parts. Replaced django's default wordwrap filter with one calling ietf.utils.text.wordwrap in templates. Changed to triggered wrapping in some cases, with the maybewordwrap filter, which triggers on lines longer than 100 characters. This fixes the issue with undesired wrapping of reviews.
- Legacy-Id: 13505
2017-06-02 23:13:22 +00:00
Henrik Levkowetz 16d129cacf Added examples of how our different text wrapper functions work to ietf.utils.text. Run 'python ietf/utils/text.py | less' to see the results.
- Legacy-Id: 13498
2017-06-02 17:59:26 +00:00
Henrik Levkowetz cf4a4b02a7 Reworked the email address handling in order to be able to support non-ascii names as part of email address fields. Reworked the generation of user names in the test suite to generate names from multiple non-ascii locales. Fixes issue #2080.
- Legacy-Id: 12872
2017-02-18 21:50:18 +00:00
Henrik Levkowetz 2c27d5c611 Moved optional text wrapping before html escaping in markup_unicode(), used by get_unicode_document_content(). Fixes a problem with lines being wrapped when they should not be.
- Legacy-Id: 12480
2016-12-08 16:27:05 +00:00
Ole Laursen 74a02be9bf Create new branch from trunk@r11921, and merge review-tracker-r11360 into it
- Legacy-Id: 11923
2016-09-06 10:17:12 +00:00