datatracker/ietf/submit/models.py
Jennifer Richards 3705bedfcd
feat: Celery support and asynchronous draft submission API (#4037)
* ci: add Dockerfile and action to build celery worker image

* ci: build celery worker on push to jennifer/celery branch

* ci: also build celery worker for main branch

* ci: Add comment to celery Dockerfile

* chore: first stab at a celery/rabbitmq docker-compose

* feat: add celery configuration and test task / endpoint

* chore: run mq/celery containers for dev work

* chore: point to ghcr.io image for celery worker

* refactor: move XML parsing duties into XMLDraft

Move some PlaintextDraft methods into the Draft base class and
implement for the XMLDraft class. Use xml2rfc code from ietf.submit
as a model for the parsing.

This leaves some mismatch between the PlaintextDraft and the Draft
class spec for the get_author_list() method to be resolved.

* feat: add api_upload endpoint and beginnings of async processing

This adds an api_upload() that behaves analogously to the api_submit()
endpoint. Celery tasks to handle asynchronous processing are added but
are not yet functional enough to be useful.

* perf: index Submission table on submission_date

This substantially speeds up submission rate threshold checks.

* feat: remove existing files when accepting a new submission

After checking that a submission is not in progress, remove any files
in staging that have the same name/rev with any extension. This should
guard against stale files confusing the submission process if the
usual cleanup fails or is skipped for some reason.

* refactor: make clear that deduce_group() uses only the draft name

* refactor: extract only draft name/revision in clean() method

Minimizing the amount of validation done when accepting a file. The
data extraction will be moved to asynchronous processing.

* refactor: minimize checks and data extraction in api_upload() view

* ci: fix dockerfiles to match sandbox testing

* ci: tweak celery container docker-compose settings

* refactor: clean up Draft parsing API and usage

  * remove get_draftname() from Draft api; set filename during init
  * further XMLDraft work
    - remember xml_version after parsing
    - extract filename/revision during init
    - comment out long broken get_abstract() method
  * adjust form clean() method to use changed API

* feat: flesh out async submission processing

First basically working pass!

* feat: add state name for submission being validated asynchronously

* feat: cancel submissions that async processing can't handle

* refactor: simplify/consolidate async tasks and improve error handling

* feat: add api_submission_status endpoint

* refactor: return JSON from submission api endpoints

* refactor: reuse cancel_submission method

* refactor: clean up error reporting a bit

* feat: guard against cancellation of a submission while validating

Not bulletproof but should prevent

* feat: indicate that a submission is still being validated

* fix: do not delete submission files after creating them

* chore: remove debug statement

* test: add tests of the api_upload and api_submission_status endpoints

* test: add tests and stubs for async side of submission handling

* fix: gracefully handle (ignore) invalid IDs in async submit task

* test: test process_uploaded_submission method

* fix: fix failures of new tests

* refactor: fix type checker complaints

* test: test submission_status view of submission in "validating" state

* fix: fix up migrations

* fix: use the streamlined SubmissionBaseUploadForm for api_upload

* feat: show submission history event timestamp as mouse-over text

* fix: remove 'manual' as next state for 'validating' submission state

* refactor: share SubmissionBaseUploadForm code with Deprecated version

* fix: validate text submission title, update a couple comments

* chore: disable requirements updating when celery dev container starts

* feat: log traceback on unexpected error during submission processing

* feat: allow secretariat to cancel "validating" submission

* feat: indicate time since submission on the status page

* perf: check submission rate thresholds earlier when possible

No sense parsing details of a draft that is going to be dropped regardless
of those details!

* fix: create Submission before saving to reduce race condition window

* fix: call deduce_group() with filename

* refactor: remove code lint

* refactor: change the api_upload URL to api/submission

* docs: update submission API documentation

* test: add tests of api_submission's text draft consistency checks

* refactor: rename api_upload to api_submission to agree with new URL

* test: test API documentation and submission thresholds

* fix: fix a couple api_submission view renames missed in templates

* chore: use base image + add arm64 support

* ci: try to fix workflow_dispatch for celery worker

* ci: another attempt to fix workflow_dispatch

* ci: build celery image for submit-async branch

* ci: fix typo

* ci: publish celery worker to ghcr.io/painless-security

* ci: install python requirements in celery image

* ci: fix up requirements install on celery image

* chore: remove XML_LIBRARY references that crept back in

* feat: accept 'replaces' field in api_submission

* docs: update api_submission documentation

* fix: remove unused import

* test: test "replaces" validation for submission API

* test: test that "replaces" is set by api_submission

* feat: trap TERM to gracefully stop celery container

* chore: tweak celery/mq settings

* docs: update installation instructions

* ci: adjust paths that trigger celery worker image  build

* ci: fix branches/repo names left over from dev

* ci: run manage.py check when initializing celery container

Driver here is applying the patches. Starting the celery workers
also invokes the check task, but this should cause a clearer failure
if something fails.

* docs: revise INSTALL instructions

* ci: pass filename to pip update in celery container

* docs: update INSTALL to include freezing pip versions

Will be used to coordinate package versions with the celery
container in production.

* docs: add explanation of frozen-requirements.txt

* ci: build image for sandbox deployment

* ci: add additional build trigger path

* docs: tweak INSTALL

* fix: change INSTALL process to stop datatracker before running migrations

* chore: use ietf.settings for manage.py check in celery container

* chore: set uid/gid for celery worker

* chore: create user/group in celery container if needed

* chore: tweak docker compose/init so celery container works in dev

* ci: build mq docker image

* fix: move rabbitmq.pid to writeable location

* fix: clear password when CELERY_PASSWORD is empty

Setting to an empty password is really not a good plan!

* chore: add shutdown debugging option to celery image

* chore: add django-celery-beat package

* chore: run "celery beat" in datatracker-celery image

* chore: fix docker image name

* feat: add task to cancel stale submissions

* test: test the cancel_stale_submissions task

* chore: make f-string with no interpolation a plain string

Co-authored-by: Nicolas Giard <github@ngpixel.com>
Co-authored-by: Robert Sparks <rjsparks@nostrum.com>
2022-08-22 13:29:31 -05:00

179 lines
6.3 KiB
Python

# Copyright The IETF Trust 2011-2020, All Rights Reserved
# -*- coding: utf-8 -*-
import datetime
import email
import jsonfield
from django.db import models
import debug # pyflakes:ignore
from ietf.doc.models import Document, ExtResource
from ietf.person.models import Person
from ietf.group.models import Group
from ietf.message.models import Message
from ietf.name.models import DraftSubmissionStateName, FormalLanguageName
from ietf.utils.accesstoken import generate_random_key, generate_access_token
from ietf.utils.text import parse_unicode
from ietf.utils.models import ForeignKey
def parse_email_line(line):
"""
Split email address into name and email like
email.utils.parseaddr() but return a dictionary
"""
name, addr = email.utils.parseaddr(line) if '@' in line else (line, '')
return dict(name=parse_unicode(name), email=addr)
class Submission(models.Model):
state = ForeignKey(DraftSubmissionStateName)
remote_ip = models.CharField(max_length=100, blank=True)
access_key = models.CharField(max_length=255, default=generate_random_key)
auth_key = models.CharField(max_length=255, blank=True)
# draft metadata
name = models.CharField(max_length=255, db_index=True)
group = ForeignKey(Group, null=True, blank=True)
title = models.CharField(max_length=255, blank=True)
abstract = models.TextField(blank=True)
rev = models.CharField(max_length=3, blank=True)
pages = models.IntegerField(null=True, blank=True)
words = models.IntegerField(null=True, blank=True)
formal_languages = models.ManyToManyField(FormalLanguageName, blank=True, help_text="Formal languages used in document")
authors = jsonfield.JSONField(default=list, help_text="List of authors with name, email, affiliation and country.")
note = models.TextField(blank=True)
replaces = models.CharField(max_length=1000, blank=True)
first_two_pages = models.TextField(blank=True)
file_types = models.CharField(max_length=50, blank=True)
file_size = models.IntegerField(null=True, blank=True)
document_date = models.DateField(null=True, blank=True)
submission_date = models.DateField(default=datetime.date.today)
xml_version = models.CharField(null=True, max_length=4, default=None)
submitter = models.CharField(max_length=255, blank=True, help_text="Name and email of submitter, e.g. \"John Doe &lt;john@example.org&gt;\".")
draft = ForeignKey(Document, null=True, blank=True)
def __str__(self):
return "%s-%s" % (self.name, self.rev)
class Meta:
indexes = [
models.Index(fields=['submission_date']),
]
def submitter_parsed(self):
return parse_email_line(self.submitter)
def access_token(self):
return generate_access_token(self.access_key)
def existing_document(self):
return Document.objects.filter(name=self.name).first()
def latest_checks(self):
checks = [ self.checks.filter(checker=c).latest('time') for c in self.checks.values_list('checker', flat=True).distinct() ]
return checks
def has_yang(self):
return any ( [ c.checker=='yang validation' and c.passed is not None for c in self.latest_checks()] )
@property
def replaces_names(self):
return self.replaces.split(',')
@property
def area(self):
return self.group.area if self.group else None
@property
def is_individual(self):
return self.group.is_individual if self.group else True
@property
def revises_wg_draft(self):
return (
self.rev != '00'
and self.group
and self.group.is_wg
)
@property
def active_wg_drafts_replaced(self):
return Document.objects.filter(
docalias__name__in=self.replaces.split(','),
group__in=Group.objects.active_wgs()
)
@property
def closed_wg_drafts_replaced(self):
return Document.objects.filter(
docalias__name__in=self.replaces.split(','),
group__in=Group.objects.closed_wgs()
)
class SubmissionCheck(models.Model):
time = models.DateTimeField(default=datetime.datetime.now)
submission = ForeignKey(Submission, related_name='checks')
checker = models.CharField(max_length=256, blank=True)
passed = models.BooleanField(null=True, default=False)
message = models.TextField(null=True, blank=True)
errors = models.IntegerField(null=True, blank=True, default=None)
warnings = models.IntegerField(null=True, blank=True, default=None)
items = jsonfield.JSONField(null=True, blank=True, default='{}')
symbol = models.CharField(max_length=64, default='')
#
def __str__(self):
return "%s submission check: %s: %s" % (self.checker, 'Passed' if self.passed else 'Failed', self.message[:48]+'...')
def has_warnings(self):
return self.warnings != '[]'
def has_errors(self):
return self.errors != '[]'
class SubmissionEvent(models.Model):
submission = ForeignKey(Submission)
time = models.DateTimeField(default=datetime.datetime.now)
by = ForeignKey(Person, null=True, blank=True)
desc = models.TextField()
def __str__(self):
return "%s %s by %s at %s" % (self.submission.name, self.desc, self.by.plain_name() if self.by else "(unknown)", self.time)
class Meta:
ordering = ("-time", "-id")
indexes = [
models.Index(fields=['-time', '-id']),
]
class Preapproval(models.Model):
"""Pre-approved draft submission name."""
name = models.CharField(max_length=255, db_index=True)
by = ForeignKey(Person)
time = models.DateTimeField(default=datetime.datetime.now)
def __str__(self):
return self.name
class SubmissionEmailEvent(SubmissionEvent):
message = ForeignKey(Message, null=True, blank=True,related_name='manualevents')
msgtype = models.CharField(max_length=25)
in_reply_to = ForeignKey(Message, null=True, blank=True,related_name='irtomanual')
def __str__(self):
return "%s %s by %s at %s" % (self.submission.name, self.desc, self.by.plain_name() if self.by else "(unknown)", self.time)
class Meta:
ordering = ['-time', '-id']
class SubmissionExtResource(ExtResource):
submission = ForeignKey(Submission, related_name='external_resources')