* ci: add Dockerfile and action to build celery worker image * ci: build celery worker on push to jennifer/celery branch * ci: also build celery worker for main branch * ci: Add comment to celery Dockerfile * chore: first stab at a celery/rabbitmq docker-compose * feat: add celery configuration and test task / endpoint * chore: run mq/celery containers for dev work * chore: point to ghcr.io image for celery worker * refactor: move XML parsing duties into XMLDraft Move some PlaintextDraft methods into the Draft base class and implement for the XMLDraft class. Use xml2rfc code from ietf.submit as a model for the parsing. This leaves some mismatch between the PlaintextDraft and the Draft class spec for the get_author_list() method to be resolved. * feat: add api_upload endpoint and beginnings of async processing This adds an api_upload() that behaves analogously to the api_submit() endpoint. Celery tasks to handle asynchronous processing are added but are not yet functional enough to be useful. * perf: index Submission table on submission_date This substantially speeds up submission rate threshold checks. * feat: remove existing files when accepting a new submission After checking that a submission is not in progress, remove any files in staging that have the same name/rev with any extension. This should guard against stale files confusing the submission process if the usual cleanup fails or is skipped for some reason. * refactor: make clear that deduce_group() uses only the draft name * refactor: extract only draft name/revision in clean() method Minimizing the amount of validation done when accepting a file. The data extraction will be moved to asynchronous processing. * refactor: minimize checks and data extraction in api_upload() view * ci: fix dockerfiles to match sandbox testing * ci: tweak celery container docker-compose settings * refactor: clean up Draft parsing API and usage * remove get_draftname() from Draft api; set filename during init * further XMLDraft work - remember xml_version after parsing - extract filename/revision during init - comment out long broken get_abstract() method * adjust form clean() method to use changed API * feat: flesh out async submission processing First basically working pass! * feat: add state name for submission being validated asynchronously * feat: cancel submissions that async processing can't handle * refactor: simplify/consolidate async tasks and improve error handling * feat: add api_submission_status endpoint * refactor: return JSON from submission api endpoints * refactor: reuse cancel_submission method * refactor: clean up error reporting a bit * feat: guard against cancellation of a submission while validating Not bulletproof but should prevent * feat: indicate that a submission is still being validated * fix: do not delete submission files after creating them * chore: remove debug statement * test: add tests of the api_upload and api_submission_status endpoints * test: add tests and stubs for async side of submission handling * fix: gracefully handle (ignore) invalid IDs in async submit task * test: test process_uploaded_submission method * fix: fix failures of new tests * refactor: fix type checker complaints * test: test submission_status view of submission in "validating" state * fix: fix up migrations * fix: use the streamlined SubmissionBaseUploadForm for api_upload * feat: show submission history event timestamp as mouse-over text * fix: remove 'manual' as next state for 'validating' submission state * refactor: share SubmissionBaseUploadForm code with Deprecated version * fix: validate text submission title, update a couple comments * chore: disable requirements updating when celery dev container starts * feat: log traceback on unexpected error during submission processing * feat: allow secretariat to cancel "validating" submission * feat: indicate time since submission on the status page * perf: check submission rate thresholds earlier when possible No sense parsing details of a draft that is going to be dropped regardless of those details! * fix: create Submission before saving to reduce race condition window * fix: call deduce_group() with filename * refactor: remove code lint * refactor: change the api_upload URL to api/submission * docs: update submission API documentation * test: add tests of api_submission's text draft consistency checks * refactor: rename api_upload to api_submission to agree with new URL * test: test API documentation and submission thresholds * fix: fix a couple api_submission view renames missed in templates * chore: use base image + add arm64 support * ci: try to fix workflow_dispatch for celery worker * ci: another attempt to fix workflow_dispatch * ci: build celery image for submit-async branch * ci: fix typo * ci: publish celery worker to ghcr.io/painless-security * ci: install python requirements in celery image * ci: fix up requirements install on celery image * chore: remove XML_LIBRARY references that crept back in * feat: accept 'replaces' field in api_submission * docs: update api_submission documentation * fix: remove unused import * test: test "replaces" validation for submission API * test: test that "replaces" is set by api_submission * feat: trap TERM to gracefully stop celery container * chore: tweak celery/mq settings * docs: update installation instructions * ci: adjust paths that trigger celery worker image build * ci: fix branches/repo names left over from dev * ci: run manage.py check when initializing celery container Driver here is applying the patches. Starting the celery workers also invokes the check task, but this should cause a clearer failure if something fails. * docs: revise INSTALL instructions * ci: pass filename to pip update in celery container * docs: update INSTALL to include freezing pip versions Will be used to coordinate package versions with the celery container in production. * docs: add explanation of frozen-requirements.txt * ci: build image for sandbox deployment * ci: add additional build trigger path * docs: tweak INSTALL * fix: change INSTALL process to stop datatracker before running migrations * chore: use ietf.settings for manage.py check in celery container * chore: set uid/gid for celery worker * chore: create user/group in celery container if needed * chore: tweak docker compose/init so celery container works in dev * ci: build mq docker image * fix: move rabbitmq.pid to writeable location * fix: clear password when CELERY_PASSWORD is empty Setting to an empty password is really not a good plan! * chore: add shutdown debugging option to celery image * chore: add django-celery-beat package * chore: run "celery beat" in datatracker-celery image * chore: fix docker image name * feat: add task to cancel stale submissions * test: test the cancel_stale_submissions task * chore: make f-string with no interpolation a plain string Co-authored-by: Nicolas Giard <github@ngpixel.com> Co-authored-by: Robert Sparks <rjsparks@nostrum.com>
173 lines
6.3 KiB
Python
173 lines
6.3 KiB
Python
# Copyright The IETF Trust 2022, All Rights Reserved
|
|
# -*- coding: utf-8 -*-
|
|
import io
|
|
import re
|
|
import xml2rfc
|
|
|
|
import debug # pyflakes: ignore
|
|
|
|
from contextlib import ExitStack
|
|
|
|
from .draft import Draft
|
|
|
|
|
|
class XMLDraft(Draft):
|
|
"""Draft from XML source
|
|
|
|
Not all methods from the superclass are implemented yet.
|
|
"""
|
|
def __init__(self, xml_file):
|
|
"""Initialize XMLDraft instance
|
|
|
|
:parameter xml_file: path to file containing XML source
|
|
"""
|
|
super().__init__()
|
|
# cast xml_file to str so, e.g., this will work with a Path
|
|
self.xmltree, self.xml_version = self.parse_xml(str(xml_file))
|
|
self.xmlroot = self.xmltree.getroot()
|
|
self.filename, self.revision = self._parse_docname()
|
|
|
|
@staticmethod
|
|
def parse_xml(filename):
|
|
"""Parse XML draft
|
|
|
|
Converts to xml2rfc v3 schema, then returns the root of the v3 tree and the original
|
|
xml version.
|
|
"""
|
|
orig_write_out = xml2rfc.log.write_out
|
|
orig_write_err = xml2rfc.log.write_err
|
|
parser_out = io.StringIO()
|
|
parser_err = io.StringIO()
|
|
|
|
with ExitStack() as stack:
|
|
@stack.callback
|
|
def cleanup(): # called when context exited, even if there's an exception
|
|
xml2rfc.log.write_out = orig_write_out
|
|
xml2rfc.log.write_err = orig_write_err
|
|
|
|
xml2rfc.log.write_out = parser_out
|
|
xml2rfc.log.write_err = parser_err
|
|
|
|
parser = xml2rfc.XmlRfcParser(filename, quiet=True)
|
|
try:
|
|
tree = parser.parse()
|
|
except Exception as e:
|
|
raise XMLParseError(parser_out.getvalue(), parser_err.getvalue()) from e
|
|
|
|
xml_version = tree.getroot().get('version', '2')
|
|
if xml_version == '2':
|
|
v2v3 = xml2rfc.V2v3XmlWriter(tree)
|
|
tree.tree = v2v3.convert2to3()
|
|
return tree, xml_version
|
|
|
|
def _document_name(self, anchor):
|
|
"""Guess document name from reference anchor
|
|
|
|
Looks for series numbers and removes leading 0s from the number.
|
|
"""
|
|
anchor = anchor.lower() # always give back lowercase
|
|
label = anchor.rstrip('0123456789') # remove trailing digits
|
|
if label in ['rfc', 'bcp', 'fyi', 'std']:
|
|
number = int(anchor[len(label):])
|
|
return f'{label}{number}'
|
|
return anchor
|
|
|
|
def _reference_section_type(self, section_name):
|
|
"""Determine reference type from name of references section"""
|
|
if section_name:
|
|
section_name = section_name.lower()
|
|
if 'normative' in section_name:
|
|
return self.REF_TYPE_NORMATIVE
|
|
elif 'informative' in section_name:
|
|
return self.REF_TYPE_INFORMATIVE
|
|
return self.REF_TYPE_UNKNOWN
|
|
|
|
def _reference_section_name(self, section_elt):
|
|
section_name = section_elt.findtext('name')
|
|
if section_name is None and 'title' in section_elt.keys():
|
|
section_name = section_elt.get('title') # fall back to title if we have it
|
|
return section_name
|
|
|
|
def _parse_docname(self):
|
|
docname = self.xmlroot.attrib.get('docName')
|
|
revmatch = re.match(
|
|
r'^(?P<filename>.+?)(?:-(?P<rev>[0-9][0-9]))?$',
|
|
docname,
|
|
|
|
)
|
|
if revmatch is None:
|
|
raise ValueError('Unable to parse docName')
|
|
# If a group had no match it is None
|
|
return revmatch.group('filename'), revmatch.group('rev')
|
|
|
|
def get_title(self):
|
|
return self.xmlroot.findtext('front/title').strip()
|
|
|
|
# todo fix the implementation of XMLDraft.get_abstract()
|
|
#
|
|
# This code was pulled from ietf.submit.forms where it existed for some time.
|
|
# It does not work, at least with modern xml2rfc. This assumes that the abstract
|
|
# is simply text in the front/abstract node, but the XML schema wraps the actual
|
|
# abstract text in <t> elements (and allows <dl>, <ol>, and <ul> as well). As a
|
|
# result, this method normally returns an empty string, which is later replaced by
|
|
# the abstract parsed from the rendered text. For now, I a commenting this out
|
|
# and making it explicit that the abstract always comes from the text format.
|
|
#
|
|
# def get_abstract(self):
|
|
# """Extract the abstract"""
|
|
# abstract = self.xmlroot.findtext('front/abstract')
|
|
# return abstract.strip() if abstract else ''
|
|
|
|
def get_author_list(self):
|
|
"""Get detailed author list
|
|
|
|
Returns a list of dicts with the following keys:
|
|
name, first_name, middle_initial, last_name,
|
|
name_suffix, email, country, affiliation
|
|
Values will be None if not available
|
|
"""
|
|
result = []
|
|
empty_author = {
|
|
k: None for k in [
|
|
'name', 'first_name', 'middle_initial', 'last_name',
|
|
'name_suffix', 'email', 'country', 'affiliation',
|
|
]
|
|
}
|
|
|
|
for author in self.xmlroot.findall('front/author'):
|
|
info = {
|
|
'name': author.attrib.get('fullname'),
|
|
'email': author.findtext('address/email'),
|
|
'affiliation': author.findtext('organization'),
|
|
}
|
|
elem = author.find('address/postal/country')
|
|
if elem is not None:
|
|
ascii_country = elem.get('ascii', None)
|
|
info['country'] = ascii_country if ascii_country else elem.text
|
|
for item in info:
|
|
if info[item]:
|
|
info[item] = info[item].strip()
|
|
result.append(empty_author | info) # merge, preferring info
|
|
return result
|
|
|
|
def get_refs(self):
|
|
"""Extract references from the draft"""
|
|
refs = {}
|
|
# accept nested <references> sections
|
|
for section in self.xmlroot.findall('back//references'):
|
|
ref_type = self._reference_section_type(self._reference_section_name(section))
|
|
for ref in (section.findall('./reference') + section.findall('./referencegroup')):
|
|
refs[self._document_name(ref.get('anchor'))] = ref_type
|
|
return refs
|
|
|
|
|
|
class XMLParseError(Exception):
|
|
"""An error occurred while parsing"""
|
|
def __init__(self, out: str, err: str, *args):
|
|
super().__init__(*args)
|
|
self._out = out
|
|
self._err = err
|
|
|
|
def parser_msgs(self):
|
|
return self._out.splitlines() + self._err.splitlines()
|