Commit graph

82 commits

Author SHA1 Message Date
Robert Sparks b926178e62
fix: quicker calculation of status from draft text (#8111)
* fix: quicker calculation of status from draft text

* chore: remove unused import

* fix: only read a small prefix of draft text when needed
2024-10-29 11:18:31 -05:00
Jennifer Richards d33a6f3c0c
fix: Handle missing date fields in XML submissions (#5744)
* refactor: Eliminate _construct_creation_date helper

* fix: Use xml2rfc method for filling in missing date fields

* fix: Set options.date for xml2rfc writers

* test: Test handling of missing date element/fields
2023-06-02 14:40:52 -05:00
Jennifer Richards 5a2708283b
feat: Extract document creation date from XML draft (#5733)
* fix: Extract document creation date from XML draft

* test: Fix test
2023-06-01 09:58:55 -05:00
Jennifer Richards 8d4780d304
fix: Ignore failure to extract text draft title unless it is needed (#5730)
* fix: Accept a Path as source for a PlaintextDraft

* fix: Guard against failure to extract PlaintextDraft title

* fix: Ignore failure to extract text draft title unless it is needed
2023-06-01 09:39:59 -05:00
Lars Eggert f8b48f4c43
fix: use Internet-Draft more consistently across the UI (#5104)
* s/Internet Draft/Internet-Draft/i

* s/draft/Internet-Draft/i or s/draft/I-D/i

* s/ID/I-D/

* Fix tests

* a -> an

* Undo case-change to ASCII

* Address code review comments

* Add migrations

* Add merged migration

* fix: straighten out migrations

* fix: finish straightening out migrations

---------

Co-authored-by: Robert Sparks <rjsparks@nostrum.com>
2023-02-11 10:09:28 -06:00
Lars Eggert 220be21998
chore: Use codespell to fix typos in code. (#4797)
* chore: Use codespell to fix typos in code.

Second part of replacement of #4651

@rjsparks, I probably need to revert some things here, and I also
still need to add that new migration - how do I do that?

* Revert migrations

* Migrate "Whitelisted" to "Allowlisted"

* TEST_COVERAGE_MASTER_FILE -> TEST_COVERAGE_MAIN_FILE

* Fix permissions

* Add suggestions from @jennifer-richards
2022-12-07 15:10:35 -06:00
Lars Eggert d59c64943d
fix: Fix spurious author extraction errors (#4799)
* Handle single-word author names

* Some i18n names, e.g., "शिला के.सी." have a dot at the end that is
also part of the ASCII, e.g., "Shilaa Kesii." That trailing dot breaks
extract_authors(). Avoid this issue by stripping the dot from the
ASCII.

* Honorifics need to be part of the extracted ASCII name (e.g., "Lady Garcia")

* feat: stop supporting pre-tzaware migration database dumps. (#4782)

* feat: stop supporting pre-tzaware migration database dumps.

* chore: remove unnecessary env variable

* chore: Use `codespell` to fix typos in comments. (#4794)

First part of replacement of #4651

* feat: Only show IPR search form when not showing search results  (#4793)

* feat: Only show IPR search form when not showing search results

Put it into a collapsible that is only expanded by default when not
showing search results.

Fixes #4569

* Don't use example target name

* fix: Don't show reorder UI fixtures unless user can reorder (#4785)

Fixes #4773

Co-authored-by: Robert Sparks <rjsparks@nostrum.com>

* chore: Update deps and fix resulting HTML validation issues (#4790)

* ci: add missing build matrix config for test-playwright-legacy step

* Single-letter last names exist (e.g., "Carolina de la O")

* Align regex with others

* Fix extraction of very long author names

* Need to be more general

* Add comment

* Also handle i18n names with trailing semicolons

* Name suffixes need to be part of the extracted author names

* Handle i18n names with embedded commas

Co-authored-by: Robert Sparks <rjsparks@nostrum.com>
Co-authored-by: Nicolas Giard <github@ngpixel.com>
2022-12-02 15:41:21 -06:00
Jennifer Richards 3220bf3c40
chore: replace last few datetime.date.today() calls with date_today() 2022-10-18 12:45:47 -03:00
Jennifer Richards 3705bedfcd
feat: Celery support and asynchronous draft submission API (#4037)
* ci: add Dockerfile and action to build celery worker image

* ci: build celery worker on push to jennifer/celery branch

* ci: also build celery worker for main branch

* ci: Add comment to celery Dockerfile

* chore: first stab at a celery/rabbitmq docker-compose

* feat: add celery configuration and test task / endpoint

* chore: run mq/celery containers for dev work

* chore: point to ghcr.io image for celery worker

* refactor: move XML parsing duties into XMLDraft

Move some PlaintextDraft methods into the Draft base class and
implement for the XMLDraft class. Use xml2rfc code from ietf.submit
as a model for the parsing.

This leaves some mismatch between the PlaintextDraft and the Draft
class spec for the get_author_list() method to be resolved.

* feat: add api_upload endpoint and beginnings of async processing

This adds an api_upload() that behaves analogously to the api_submit()
endpoint. Celery tasks to handle asynchronous processing are added but
are not yet functional enough to be useful.

* perf: index Submission table on submission_date

This substantially speeds up submission rate threshold checks.

* feat: remove existing files when accepting a new submission

After checking that a submission is not in progress, remove any files
in staging that have the same name/rev with any extension. This should
guard against stale files confusing the submission process if the
usual cleanup fails or is skipped for some reason.

* refactor: make clear that deduce_group() uses only the draft name

* refactor: extract only draft name/revision in clean() method

Minimizing the amount of validation done when accepting a file. The
data extraction will be moved to asynchronous processing.

* refactor: minimize checks and data extraction in api_upload() view

* ci: fix dockerfiles to match sandbox testing

* ci: tweak celery container docker-compose settings

* refactor: clean up Draft parsing API and usage

  * remove get_draftname() from Draft api; set filename during init
  * further XMLDraft work
    - remember xml_version after parsing
    - extract filename/revision during init
    - comment out long broken get_abstract() method
  * adjust form clean() method to use changed API

* feat: flesh out async submission processing

First basically working pass!

* feat: add state name for submission being validated asynchronously

* feat: cancel submissions that async processing can't handle

* refactor: simplify/consolidate async tasks and improve error handling

* feat: add api_submission_status endpoint

* refactor: return JSON from submission api endpoints

* refactor: reuse cancel_submission method

* refactor: clean up error reporting a bit

* feat: guard against cancellation of a submission while validating

Not bulletproof but should prevent

* feat: indicate that a submission is still being validated

* fix: do not delete submission files after creating them

* chore: remove debug statement

* test: add tests of the api_upload and api_submission_status endpoints

* test: add tests and stubs for async side of submission handling

* fix: gracefully handle (ignore) invalid IDs in async submit task

* test: test process_uploaded_submission method

* fix: fix failures of new tests

* refactor: fix type checker complaints

* test: test submission_status view of submission in "validating" state

* fix: fix up migrations

* fix: use the streamlined SubmissionBaseUploadForm for api_upload

* feat: show submission history event timestamp as mouse-over text

* fix: remove 'manual' as next state for 'validating' submission state

* refactor: share SubmissionBaseUploadForm code with Deprecated version

* fix: validate text submission title, update a couple comments

* chore: disable requirements updating when celery dev container starts

* feat: log traceback on unexpected error during submission processing

* feat: allow secretariat to cancel "validating" submission

* feat: indicate time since submission on the status page

* perf: check submission rate thresholds earlier when possible

No sense parsing details of a draft that is going to be dropped regardless
of those details!

* fix: create Submission before saving to reduce race condition window

* fix: call deduce_group() with filename

* refactor: remove code lint

* refactor: change the api_upload URL to api/submission

* docs: update submission API documentation

* test: add tests of api_submission's text draft consistency checks

* refactor: rename api_upload to api_submission to agree with new URL

* test: test API documentation and submission thresholds

* fix: fix a couple api_submission view renames missed in templates

* chore: use base image + add arm64 support

* ci: try to fix workflow_dispatch for celery worker

* ci: another attempt to fix workflow_dispatch

* ci: build celery image for submit-async branch

* ci: fix typo

* ci: publish celery worker to ghcr.io/painless-security

* ci: install python requirements in celery image

* ci: fix up requirements install on celery image

* chore: remove XML_LIBRARY references that crept back in

* feat: accept 'replaces' field in api_submission

* docs: update api_submission documentation

* fix: remove unused import

* test: test "replaces" validation for submission API

* test: test that "replaces" is set by api_submission

* feat: trap TERM to gracefully stop celery container

* chore: tweak celery/mq settings

* docs: update installation instructions

* ci: adjust paths that trigger celery worker image  build

* ci: fix branches/repo names left over from dev

* ci: run manage.py check when initializing celery container

Driver here is applying the patches. Starting the celery workers
also invokes the check task, but this should cause a clearer failure
if something fails.

* docs: revise INSTALL instructions

* ci: pass filename to pip update in celery container

* docs: update INSTALL to include freezing pip versions

Will be used to coordinate package versions with the celery
container in production.

* docs: add explanation of frozen-requirements.txt

* ci: build image for sandbox deployment

* ci: add additional build trigger path

* docs: tweak INSTALL

* fix: change INSTALL process to stop datatracker before running migrations

* chore: use ietf.settings for manage.py check in celery container

* chore: set uid/gid for celery worker

* chore: create user/group in celery container if needed

* chore: tweak docker compose/init so celery container works in dev

* ci: build mq docker image

* fix: move rabbitmq.pid to writeable location

* fix: clear password when CELERY_PASSWORD is empty

Setting to an empty password is really not a good plan!

* chore: add shutdown debugging option to celery image

* chore: add django-celery-beat package

* chore: run "celery beat" in datatracker-celery image

* chore: fix docker image name

* feat: add task to cancel stale submissions

* test: test the cancel_stale_submissions task

* chore: make f-string with no interpolation a plain string

Co-authored-by: Nicolas Giard <github@ngpixel.com>
Co-authored-by: Robert Sparks <rjsparks@nostrum.com>
2022-08-22 13:29:31 -05:00
Robert Sparks 10396d6f01
chore: remove more tools.ietf.org server only related things. (#4103)
* chore: remove more tools.ietf.org server only related things.

* chore: remove use of tools.ietf.org floorplans\n\nThe data will move into the FloorPlan models instead.
2022-06-22 14:11:46 -05:00
Jennifer Richards cf62b46093 Find references from submitted XML instead of rendering to text and parsing. Fixes #3342. Commit ready for merge.
- Legacy-Id: 19825
2022-01-07 17:53:23 +00:00
Robert Sparks 50a1e6e66b Tune text draft reference extractor. Fixes #3404. Commit ready for merge.
- Legacy-Id: 19363
2021-09-14 16:44:30 +00:00
Robert Sparks 3697180cc1 Reverted merge of timezone-aware migration efforts.
- Legacy-Id: 18792
2021-01-12 16:54:20 +00:00
Henrik Levkowetz 774e752a54 Snapshot of timezone-aware datatracker code. Tests pass, and the test-crawler shows only expected differences. Trunk changes merged in up to r18768.
- Legacy-Id: 18770
2020-12-16 23:53:37 +00:00
Henrik Levkowetz 726fcbf27d Removed all __future__ imports.
- Legacy-Id: 17391
2020-03-05 23:53:42 +00:00
Henrik Levkowetz 1c808bf63b Removed further six usage.
- Legacy-Id: 17387
2020-03-05 15:54:32 +00:00
Henrik Levkowetz e9a37d8ac8 Removed six.text_type(), changed six.moves.urllib to plain urllib, and removed now unused six imports.
- Legacy-Id: 17385
2020-03-05 14:41:41 +00:00
Henrik Levkowetz 33e8733b91 Fixed up mypy issues or added type:ignore comments as needed for a clean mypy run.
- Legacy-Id: 16772
2019-09-30 15:42:18 +00:00
Henrik Levkowetz 8c6eb3a30a Python2/3 compatibility: Changed the use of open() and StringIO to io.open() etc.
- Legacy-Id: 16458
2019-07-15 19:14:04 +00:00
Henrik Levkowetz f481f5c3e6 Replaced use of six with the equivalent pure python3 constructs.
- Legacy-Id: 16428
2019-07-08 10:43:47 +00:00
Henrik Levkowetz 0589d0b313 Changed a bunch of regexes to use r strings; also miscellaneous smaller fixes.
- Legacy-Id: 16376
2019-07-04 15:51:05 +00:00
Henrik Levkowetz 3ec7e864be Converted leading tabs to spaces in ietf/**/*.py
- Legacy-Id: 16310
2019-06-27 14:51:02 +00:00
Henrik Levkowetz d7f5c84182 Initial 2to3 patch with added copyright statement updates.
- Legacy-Id: 16309
2019-06-27 14:40:54 +00:00
Henrik Levkowetz d19228110c Applied a patch from dkg@fifthhorseman.net: py3 compatibility: fix another instance of integer division
- Legacy-Id: 15896
2019-01-15 17:50:33 +00:00
Henrik Levkowetz 98a74bd7f3 Moved __future__ imports down so as not to obscure the module docstring. Fixes inability to run '$ ietf/utils/draft.py -h'.
- Legacy-Id: 15894
2019-01-14 22:28:52 +00:00
Henrik Levkowetz 910d3d7723 Applied a patch from dkg@fifthhorseman.net: py3 compatibility: Use a list of dictionary keys
In python3, dict.keys() produces a dict_keys object, not a list.
    Since this code treats it as a list, we'll just be explicit about
    that.
 - Legacy-Id: 15893
2019-01-14 21:06:08 +00:00
Henrik Levkowetz c8f98e125c Applied a patch from dkg@fifthhorseman.net: Fix regex manipulation for word characters.
in python 3.7, re.sub() started treating unknown escape sequences in
     as errors.  Fix this by sending an escaped \ where we mean to
    pass it through raw.
    
    https://docs.python.org/3/library/re.html#re.sub
 - Legacy-Id: 15892
2019-01-14 21:03:28 +00:00
Henrik Levkowetz e39358312b Applied a patch from dkg@fifthhorseman.net: py3 compatibility: Use // for explicit integer division
Without this fix, in modern versions of python, the changed line
    produces:
    
    TypeError: 'float' object cannot be interpreted as an integer
 - Legacy-Id: 15891
2019-01-14 21:02:01 +00:00
Henrik Levkowetz e718272e71 Applied a patch from dkg@fifthhorseman.net: py3 compatibility: Use modern form of exception handling
- Legacy-Id: 15890
2019-01-14 21:00:50 +00:00
Henrik Levkowetz 8840efaef4 Applied a patch from dkg@fifthhorseman.net: py3 compatibility: use print function.
- Legacy-Id: 15889
2019-01-14 20:56:59 +00:00
Henrik Levkowetz a485c74314 Merged in [14880] from rjsparks@nostrum.com:
Added a Draft test suite.
 - Legacy-Id: 14901
Note: SVN reference [14880] has been migrated to Git commit e09a28cad2
2018-03-22 16:34:10 +00:00
Russ Housley 565b10e00e Improve parser for references in Internet-Drafts. Fixes #2360
- Legacy-Id: 14851
2018-03-17 18:25:31 +00:00
Henrik Levkowetz 48fe02d58c Permit tildes in romanization of draft author names when looking for draft authors. Can be used in romanization of arabic names.
- Legacy-Id: 14256
2017-11-01 11:51:24 +00:00
Henrik Levkowetz 0e00adc5ee Another tweak to the draft author extraction code, to handle some name transliterations using multiple leading grave accents.
- Legacy-Id: 14149
2017-09-21 09:28:18 +00:00
Henrik Levkowetz 2c1438c240 Moved unidecode_name from utils.text to person.name.
Modified UserFactory to use a new locale for each new user, instead of the
same locale for a whole test run.  This (almost) ensures the exercise of
code to deal with non-ascii names, something which would not happen if a
locale with ascii names was chosen at the start of a run.

Modified name.initials() to not use non-word characters as initials.

Modified unidecode_name() to do more normalization, to conform to the
conventions used in internet-drafts.

Added saving of the factory-boy random state in order to be able to re-run
a test suite with the same pseudo-random sequence as in a previous failed
run.

Fixed an issue with email formatting in test_api_submit_ok().

Modified the draft author extraction code to deal better with names with
embedded apostrophes.
 - Legacy-Id: 14141
2017-09-20 15:36:30 +00:00
Henrik Levkowetz aafd6290a6 Added an option to ietf.utils.draft.Draft to pull document name from the source file name.
- Legacy-Id: 14089
2017-08-31 14:48:43 +00:00
Henrik Levkowetz b42f1cbeb5 Replaced the use of unaccent.asciify(), which has similar functionality to unidecode.unidecode(). Changed the draft parser to work exclusively with unicode text, which both makes the removal of unaccent easier, and takes us closer to Py35 compatibility. Adjusted callers of the draft parser to send in unicode.
- Legacy-Id: 13673
2017-06-18 18:23:18 +00:00
Henrik Levkowetz 76628be3fd Merged in ^/branch/iola/author-stats-r13145 from olau@iola.dk, and fixed some tests in code which moved after the latest merge with trunk. The test suite passes, but the migrations are _not_ ready to run, because of numbering conflicts (again due to code changes on trunk since the latest sync).
- Legacy-Id: 13479
2017-05-31 20:59:26 +00:00
Henrik Levkowetz 38bfdb4095 Fixed a bug in the earlier author extraction bugfix.
- Legacy-Id: 13295
2017-05-10 12:21:17 +00:00
Henrik Levkowetz fb70e9a4ff Fixed an issue with the author extraction code.
- Legacy-Id: 13288
2017-05-09 19:19:55 +00:00
Ole Laursen ef4d55f0c9 Apply patch from Henrik Levkowetz to fix some problems of author parse
errors where the affiliation is mistakenly thought to be an extra
author (some of these still remain)
 - Legacy-Id: 13142
2017-03-27 08:33:49 +00:00
Ole Laursen d2e85a3aa3 Apply draft parser patch from Henrik to improve the patch on trunk to
combine paragraphs across page splits - this makes the country part of
the parser find more countries
 - Legacy-Id: 12848
2017-02-15 19:10:59 +00:00
Ole Laursen b2ff10b0f2 Add support for extracting the country line from the author addresses
to the draft parser (incorporating patch from trunk), store the
extracted country instead of trying to turn it into an ISO country
code, add country and continent name models and add initial data for
those, add helper function for cleaning the countries, add author
country and continent charts, move the affiliation models to
stats/models.py, fix a bunch of bugs.
 - Legacy-Id: 12846
2017-02-15 18:43:57 +00:00
Henrik Levkowetz 44ad914fba Tweaked the company name extraction code in class Draft.
- Legacy-Id: 12842
2017-02-15 14:09:54 +00:00
Henrik Levkowetz bb5e5b97ba Another tweak to handle page break paragraph joins better in class Draft.
- Legacy-Id: 12840
2017-02-14 17:41:30 +00:00
Henrik Levkowetz 6158221fa8 Tweaked the author extraction to recognize short lines as paragraph ends, not only lines ending in '.' or ':'
- Legacy-Id: 12837
2017-02-14 14:23:15 +00:00
Ole Laursen aebfe44f9e Add simple detection of formal languages used in draft, partially
based on the code in getauthors by Jari Arkko
 - Legacy-Id: 12657
2017-01-16 16:08:56 +00:00
Ole Laursen 34a9f36534 Add helper for getting word count from draft
- Legacy-Id: 12655
2017-01-16 11:35:48 +00:00
Henrik Levkowetz 887455c1d5 Make sure to not include draft name in the title extracted from draft text.
- Legacy-Id: 12176
2016-10-19 12:18:59 +00:00
Henrik Levkowetz f5ca3a12bc Fixed a bug in the header/footer stripping done before abstract extraction when a draft is submitted.
- Legacy-Id: 10519
2015-11-24 20:01:31 +00:00