diff --git a/.github/workflows/build-celery-worker.yml b/.github/workflows/build-celery-worker.yml new file mode 100644 index 000000000..93f8c2a43 --- /dev/null +++ b/.github/workflows/build-celery-worker.yml @@ -0,0 +1,47 @@ +name: Build Celery Worker Docker Image + +on: + push: + branches: + - 'main' + - 'jennifer/submit-async' + paths: + - 'requirements.txt' + - 'dev/celery/**' + - '.github/workflows/build-celery-worker.yml' + + workflow_dispatch: + +jobs: + publish: + runs-on: ubuntu-latest + permissions: + contents: read + packages: write + + steps: + - uses: actions/checkout@v2 + + - name: Set up QEMU + uses: docker/setup-qemu-action@v2 + + - name: Set up Docker Buildx + uses: docker/setup-buildx-action@v2 + + - name: Login to GitHub Container Registry + uses: docker/login-action@v2 + with: + registry: ghcr.io + username: ${{ github.actor }} + password: ${{ secrets.GITHUB_TOKEN }} + + - name: Docker Build & Push + uses: docker/build-push-action@v3 + with: + context: . + file: dev/celery/Dockerfile + platforms: linux/amd64,linux/arm64 + push: true +# tags: ghcr.io/ietf-tools/datatracker-celery:latest + tags: ghcr.io/painless-security/datatracker-celery:latest + diff --git a/.github/workflows/build-mq-broker.yml b/.github/workflows/build-mq-broker.yml new file mode 100644 index 000000000..8b77d6dfe --- /dev/null +++ b/.github/workflows/build-mq-broker.yml @@ -0,0 +1,46 @@ +name: Build MQ Broker Docker Image + +on: + push: + branches: + - 'main' + - 'jennifer/submit-async' + paths: + - 'dev/mq/**' + - '.github/workflows/build-mq-worker.yml' + + workflow_dispatch: + +jobs: + publish: + runs-on: ubuntu-latest + permissions: + contents: read + packages: write + + steps: + - uses: actions/checkout@v2 + + - name: Set up QEMU + uses: docker/setup-qemu-action@v2 + + - name: Set up Docker Buildx + uses: docker/setup-buildx-action@v2 + + - name: Login to GitHub Container Registry + uses: docker/login-action@v2 + with: + registry: ghcr.io + username: ${{ github.actor }} + password: ${{ secrets.GITHUB_TOKEN }} + + - name: Docker Build & Push + uses: docker/build-push-action@v3 + with: + context: . + file: dev/mq/Dockerfile + platforms: linux/amd64,linux/arm64 + push: true +# tags: ghcr.io/ietf-tools/datatracker-mq:latest + tags: ghcr.io/painless-security/datatracker-mq:latest + diff --git a/dev/INSTALL b/dev/INSTALL index 132e607f5..f422af0f2 100644 --- a/dev/INSTALL +++ b/dev/INSTALL @@ -29,6 +29,11 @@ General Instructions for Deployment of a New Release python3.9 -mvenv env source env/bin/activate pip install -r requirements.txt + pip freeze > frozen-requirements.txt + + (The pip freeze command records the exact versions of the Python libraries that pip installed. + This is used by the celery docker container to ensure it uses the same library versions as + the datatracker service.) 5. Move static files into place for CDN (/a/www/www6s/lib/dt): @@ -36,31 +41,66 @@ General Instructions for Deployment of a New Release 6. Run system checks (which patches the just installed modules):: - ietf/manage.py check + ietf/manage.py check - 7. Run migrations: + 7. Switch to the docker directory and update images: + cd /a/docker/datatracker-cel + docker image tag ghcr.io/ietf-tools/datatracker-celery:latest datatracker-celery-fallback + docker image tag ghcr.io/ietf-tools/datatracker-mq:latest datatracker-mq-fallback + docker-compose pull + + 8. Stop and remove the async task containers: + Wait for this to finish cleanly. It may take up to about 10 minutes for the 'stop' command to + complete if a long-running task is in progress. + + docker-compose down + + 9. Stop the datatracker + (consider doing this with a second shell at ietfa to avoid the exit and shift back to wwwrun) + + exit + sudo systemctl stop datatracker.socket datatracker.service + sudo su - -s /bin/bash wwwrun + + 10. Return to the release directory and run migrations: + + cd /a/www/ietf-datatracker/${releasenumber} ietf/manage.py migrate Take note if any migrations were executed. - 8. Back out one directory level, then re-point the 'web' symlink:: + 11. Back out one directory level, then re-point the 'web' symlink:: cd .. rm ./web; ln -s ${releasenumber} web - 9. Reload the datatracker service (it is no longer necessary to restart apache) :: + 12. Start the datatracker service (it is no longer necessary to restart apache) :: exit # or CTRL-D, back to root level shell - systemctl restart datatracker + sudo systemctl start datatracker.service datatracker.socket - 10. Verify operation: + 13. Start async task worker and message broker: + + cd /a/docker/datatracker-cel + bash startcommand + + 14. Verify operation: http://datatracker.ietf.org/ - 11. If install failed and there were no migrations at step 7, revert web symlink and repeat the restart in step 9. - If there were migrations at step 7, they will need to be reversed before the restart at step 9. If it's not obvious - what to do to reverse the migrations, contact the dev team. + 15. If install failed and there were no migrations at step 9, revert web symlink and docker update and repeat the + restart in steps 11 and 12. To revert the docker update: + + cd /a/docker/datatracker-cel + docker-compose down + docker image rm ghcr.io/ietf-tools/datatracker-celery:latest ghcr.io/ietf-tools/datatracker-mq:latest + docker image tag datatracker-celery-fallback ghcr.io/ietf-tools/datatracker-celery:latest + docker image tag datatracker-mq-fallback ghcr.io/ietf-tools/datatracker-mq:latest + cd - + + If there were migrations at step 10, they will need to be reversed before the restart at step 12. + If it's not obvious what to do to reverse the migrations, contact the dev team. Patching a Production Release @@ -95,8 +135,17 @@ The following process should be used: 6. Edit ``.../ietf/__init__.py`` in the new patched release to indicate the patch version in the ``__patch__`` string. - 7. Change the 'web' symlink, reload etc. as described in + 7. Stop the async task container (this may take a few minutes if tasks are in progress): + + cd /a/docker/datatracker-cel + docker-compose stop celery + + 8. Change the 'web' symlink, reload etc. as described in `General Instructions for Deployment of a New Release`_. + 9. Start async task worker: + + cd /a/docker/datatracker-cel + bash startcommand diff --git a/dev/celery/Dockerfile b/dev/celery/Dockerfile new file mode 100644 index 000000000..91f2949a6 --- /dev/null +++ b/dev/celery/Dockerfile @@ -0,0 +1,20 @@ +# Dockerfile for celery worker +# +FROM ghcr.io/ietf-tools/datatracker-app-base:latest +LABEL maintainer="IETF Tools Team " + +ENV DEBIAN_FRONTEND=noninteractive + +RUN apt-get purge -y imagemagick imagemagick-6-common + +# Copy the startup file +COPY dev/celery/docker-init.sh /docker-init.sh +RUN sed -i 's/\r$//' /docker-init.sh && \ + chmod +x /docker-init.sh + +# Install current datatracker python dependencies +COPY requirements.txt /tmp/pip-tmp/ +RUN pip3 --disable-pip-version-check --no-cache-dir install -r /tmp/pip-tmp/requirements.txt +RUN rm -rf /tmp/pip-tmp + +ENTRYPOINT [ "/docker-init.sh" ] \ No newline at end of file diff --git a/dev/celery/docker-init.sh b/dev/celery/docker-init.sh new file mode 100755 index 000000000..9d00328ad --- /dev/null +++ b/dev/celery/docker-init.sh @@ -0,0 +1,75 @@ +#!/bin/bash +# +# Environment parameters: +# +# CELERY_APP - name of application to pass to celery (defaults to ietf) +# +# CELERY_ROLE - 'worker' or 'beat' (defaults to 'worker') +# +# CELERY_UID - numeric uid for the celery worker process +# +# CELERY_GID - numeric gid for the celery worker process +# +# UPDATES_REQUIREMENTS_FROM - path, relative to /workspace mount, to a pip requirements +# file that should be installed at container startup. Default is no package install/update. +# +# DEBUG_TERM_TIMING - if non-empty, writes debug messages during shutdown after a TERM signal +# +WORKSPACEDIR="/workspace" +CELERY_ROLE="${CELERY_ROLE:-worker}" + +cd "$WORKSPACEDIR" || exit 255 + +if [[ -n "${UPDATE_REQUIREMENTS_FROM}" ]]; then + reqs_file="${WORKSPACEDIR}/${UPDATE_REQUIREMENTS_FROM}" + echo "Updating requirements from ${reqs_file}..." + pip install --upgrade -r "${reqs_file}" +fi + +if [[ "${CELERY_ROLE}" == "worker" ]]; then + echo "Running initial checks..." + /usr/local/bin/python $WORKSPACEDIR/ietf/manage.py check +fi + +CELERY_OPTS=( "${CELERY_ROLE}" ) +if [[ -n "${CELERY_UID}" ]]; then + # ensure that some group with the necessary GID exists in container + if ! id "${CELERY_UID}" ; then + adduser --system --uid "${CELERY_UID}" --no-create-home --disabled-login "celery-user-${CELERY_UID}" + fi + CELERY_OPTS+=("--uid=${CELERY_UID}") +fi + +if [[ -n "${CELERY_GID}" ]]; then + # ensure that some group with the necessary GID exists in container + if ! getent group "${CELERY_GID}" ; then + addgroup --gid "${CELERY_GID}" "celery-group-${CELERY_GID}" + fi + CELERY_OPTS+=("--gid=${CELERY_GID}") +fi + +log_term_timing_msgs () { + # output periodic debug message + while true; do + echo "Waiting for celery worker shutdown ($(date --utc --iso-8601=ns))" + sleep 0.5s + done +} + +cleanup () { + # Cleanly terminate the celery app by sending it a TERM, then waiting for it to exit. + if [[ -n "${celery_pid}" ]]; then + echo "Gracefully terminating celery worker. This may take a few minutes if tasks are in progress..." + kill -TERM "${celery_pid}" + if [[ -n "${DEBUG_TERM_TIMING}" ]]; then + log_term_timing_msgs & + fi + wait "${celery_pid}" + fi +} + +trap 'trap "" TERM; cleanup' TERM +# start celery in the background so we can trap the TERM signal +celery --app="${CELERY_APP:-ietf}" "${CELERY_OPTS[@]}" "$@" & +celery_pid=$! +wait "${celery_pid}" diff --git a/dev/mq/Dockerfile b/dev/mq/Dockerfile new file mode 100644 index 000000000..e8871c30a --- /dev/null +++ b/dev/mq/Dockerfile @@ -0,0 +1,17 @@ +# Dockerfile for RabbitMQ worker +# +FROM rabbitmq:3-alpine +LABEL maintainer="IETF Tools Team " + +# Copy the startup file +COPY dev/mq/ietf-rabbitmq-server.bash /ietf-rabbitmq-server.bash +RUN sed -i 's/\r$//' /ietf-rabbitmq-server.bash && \ + chmod +x /ietf-rabbitmq-server.bash + +# Put the rabbitmq.conf in the conf.d so it runs after 10-defaults.conf. +# Can override this for an individual container by mounting additional +# config files in /etc/rabbitmq/conf.d. +COPY dev/mq/rabbitmq.conf /etc/rabbitmq/conf.d/20-ietf-config.conf +COPY dev/mq/definitions.json /definitions.json + +CMD ["/ietf-rabbitmq-server.bash"] diff --git a/dev/mq/definitions.json b/dev/mq/definitions.json new file mode 100644 index 000000000..60e4fdba0 --- /dev/null +++ b/dev/mq/definitions.json @@ -0,0 +1,30 @@ +{ + "permissions": [ + { + "configure": ".*", + "read": ".*", + "user": "datatracker", + "vhost": "dt", + "write": ".*" + } + ], + "users": [ + { + "hashing_algorithm": "rabbit_password_hashing_sha256", + "limits": {}, + "name": "datatracker", + "password_hash": "", + "tags": [] + } + ], + "vhosts": [ + { + "limits": [], + "metadata": { + "description": "", + "tags": [] + }, + "name": "dt" + } + ] +} diff --git a/dev/mq/ietf-rabbitmq-server.bash b/dev/mq/ietf-rabbitmq-server.bash new file mode 100755 index 000000000..56effba17 --- /dev/null +++ b/dev/mq/ietf-rabbitmq-server.bash @@ -0,0 +1,22 @@ +#!/bin/bash -x +# +# Environment parameters: +# +# CELERY_PASSWORD - password for the datatracker celery user +# +export RABBITMQ_PID_FILE=/tmp/rabbitmq.pid + +update_celery_password () { + rabbitmqctl wait "${RABBITMQ_PID_FILE}" --timeout 300 + rabbitmqctl await_startup --timeout 300 + if [[ -n "${CELERY_PASSWORD}" ]]; then + rabbitmqctl change_password datatracker <[0-9]+)/status/?', submit_views.api_submission_status), # Datatracker version url(r'^version/?$', api_views.version), # Application authentication API key diff --git a/ietf/celeryapp.py b/ietf/celeryapp.py new file mode 100644 index 000000000..cefde3a8d --- /dev/null +++ b/ietf/celeryapp.py @@ -0,0 +1,22 @@ +import os + +from celery import Celery + +# Set the default Django settings module for the 'celery' program +os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'ietf.settings') + +app = Celery('ietf') + +# Using a string here means the worker doesn't have to serialize +# the configuration object to child processes. +# - namespace='CELERY' means all celery-related configuration keys +# should have a `CELERY_` prefix. +app.config_from_object('django.conf:settings', namespace='CELERY') + +# Load task modules from all registered Django apps. +app.autodiscover_tasks() + + +@app.task(bind=True) +def debug_task(self): + print(f'Request: {self.request!r}') diff --git a/ietf/name/fixtures/names.json b/ietf/name/fixtures/names.json index e8509628d..41f7fec62 100644 --- a/ietf/name/fixtures/names.json +++ b/ietf/name/fixtures/names.json @@ -10443,6 +10443,20 @@ "model": "name.draftsubmissionstatename", "pk": "uploaded" }, + { + "fields": { + "desc": "Running validation checks on received submission", + "name": "Validating Submitted Draft", + "next_states": [ + "uploaded", + "cancel" + ], + "order": 10, + "used": true + }, + "model": "name.draftsubmissionstatename", + "pk": "validating" + }, { "fields": { "desc": "", diff --git a/ietf/name/migrations/0044_validating_draftsubmissionstatename.py b/ietf/name/migrations/0044_validating_draftsubmissionstatename.py new file mode 100644 index 000000000..de82bbeef --- /dev/null +++ b/ietf/name/migrations/0044_validating_draftsubmissionstatename.py @@ -0,0 +1,40 @@ +# Generated by Django 2.2.28 on 2022-05-17 11:35 + +from django.db import migrations + + +def forward(apps, schema_editor): + DraftSubmissionStateName = apps.get_model('name', 'DraftSubmissionStateName') + new_state = DraftSubmissionStateName.objects.create( + slug='validating', + name='Validating Submitted Draft', + desc='Running validation checks on received submission', + used=True, + order=1 + DraftSubmissionStateName.objects.order_by('-order').first().order, + ) + new_state.next_states.set( + DraftSubmissionStateName.objects.filter( + slug__in=['cancel', 'uploaded'], + ) + ) + + +def reverse(apps, schema_editor): + Submission = apps.get_model('submit', 'Submission') + # Any submissions in the state we are about to delete would be deleted. + # Remove these manually if you really mean to do this. + assert Submission.objects.filter(state__slug='validating').count() == 0 + DraftSubmissionStateName = apps.get_model('name', 'DraftSubmissionStateName') + DraftSubmissionStateName.objects.filter(slug='validating').delete() + + +class Migration(migrations.Migration): + + dependencies = [ + ('name', '0043_editorial_stream_grouptype'), + ('submit', '0001_initial'), # ensure Submission model exists + ] + + operations = [ + migrations.RunPython(forward, reverse) + ] diff --git a/ietf/settings.py b/ietf/settings.py index 0edf20229..12975f061 100644 --- a/ietf/settings.py +++ b/ietf/settings.py @@ -437,6 +437,7 @@ INSTALLED_APPS = [ 'analytical', 'django_vite', 'django_bootstrap5', + 'django_celery_beat', 'corsheaders', 'django_markup', 'django_password_strength', @@ -843,6 +844,8 @@ IDSUBMIT_CHECKER_CLASSES = ( # "ietf.submit.checkers.DraftYangvalidatorChecker", ) +# Max time to allow for validation before a submission is subject to cancellation +IDSUBMIT_MAX_VALIDATION_TIME = datetime.timedelta(minutes=20) IDSUBMIT_MANUAL_STAGING_DIR = '/tmp/' @@ -1176,6 +1179,14 @@ qvNU+qRWi+YXrITsgn92/gVxX5AoK0n+s5Lx7fpjxkARVi66SF6zTJnX DEFAULT_REQUESTS_TIMEOUT = 20 # seconds +# Celery configuration +CELERY_TIMEZONE = 'UTC' +CELERY_BROKER_URL = 'amqp://mq/' +CELERY_BEAT_SCHEDULER = 'django_celery_beat.schedulers:DatabaseScheduler' +CELERY_BEAT_SYNC_EVERY = 1 # update DB after every event +assert not USE_TZ, 'Drop DJANGO_CELERY_BEAT_TZ_AWARE setting once USE_TZ is True!' +DJANGO_CELERY_BEAT_TZ_AWARE = False + # Meetecho API setup: Uncomment this and provide real credentials to enable # Meetecho conference creation for interim session requests # diff --git a/ietf/submit/forms.py b/ietf/submit/forms.py index f97894594..34f568d2e 100644 --- a/ietf/submit/forms.py +++ b/ietf/submit/forms.py @@ -15,10 +15,11 @@ from contextlib import ExitStack from email.utils import formataddr from unidecode import unidecode +from urllib.parse import urljoin from django import forms from django.conf import settings -from django.utils.html import mark_safe # type:ignore +from django.utils.html import mark_safe, format_html # type:ignore from django.urls import reverse as urlreverse from django.utils.encoding import force_str @@ -28,6 +29,7 @@ from ietf.doc.models import Document from ietf.group.models import Group from ietf.ietfauth.utils import has_role from ietf.doc.fields import SearchableDocAliasesField +from ietf.doc.models import DocAlias from ietf.ipr.mail import utc_from_string from ietf.meeting.models import Meeting from ietf.message.models import Message @@ -40,6 +42,8 @@ from ietf.submit.parsers.xml_parser import XMLParser from ietf.utils import log from ietf.utils.draft import PlaintextDraft from ietf.utils.text import normalize_text +from ietf.utils.xmldraft import XMLDraft, XMLParseError + class SubmissionBaseUploadForm(forms.Form): xml = forms.FileField(label='.xml format', required=True) @@ -129,12 +133,215 @@ class SubmissionBaseUploadForm(forms.Form): self.file_info[field_name] = parser_class(f).critical_parse() if self.file_info[field_name].errors: raise forms.ValidationError(self.file_info[field_name].errors) - return f def clean_xml(self): return self.clean_file("xml", XMLParser) + def clean(self): + def format_messages(where, e, log_msgs): + m = str(e) + if m: + m = [m] + else: + import traceback + typ, val, tb = sys.exc_info() + m = traceback.format_exception(typ, val, tb) + m = [ l.replace('\n ', ':\n ') for l in m ] + msgs = [s for s in (["Error from xml2rfc (%s):" % (where,)] + m + log_msgs) if s] + return msgs + + if self.shutdown and not has_role(self.request.user, "Secretariat"): + raise forms.ValidationError('The submission tool is currently shut down') + + # check general submission rate thresholds before doing any more work + today = datetime.date.today() + self.check_submissions_thresholds( + "for the same submitter", + dict(remote_ip=self.remote_ip, submission_date=today), + settings.IDSUBMIT_MAX_DAILY_SAME_SUBMITTER, settings.IDSUBMIT_MAX_DAILY_SAME_SUBMITTER_SIZE, + ) + self.check_submissions_thresholds( + "across all submitters", + dict(submission_date=today), + settings.IDSUBMIT_MAX_DAILY_SUBMISSIONS, settings.IDSUBMIT_MAX_DAILY_SUBMISSIONS_SIZE, + ) + + for ext in self.formats: + f = self.cleaned_data.get(ext, None) + if not f: + continue + self.file_types.append('.%s' % ext) + if not ('.txt' in self.file_types or '.xml' in self.file_types): + if not self.errors: + raise forms.ValidationError('Unexpected submission file types; found %s, but %s is required' % (', '.join(self.file_types), ' or '.join(self.base_formats))) + + # Determine the draft name and revision. Try XML first. + if self.cleaned_data.get('xml'): + xml_file = self.cleaned_data.get('xml') + tfn = None + with ExitStack() as stack: + @stack.callback + def cleanup(): # called when context exited, even in case of exception + if tfn is not None: + os.unlink(tfn) + + # We need to write the xml file to disk in order to hand it + # over to the xml parser. XXX FIXME: investigate updating + # xml2rfc to be able to work with file handles to in-memory + # files. + name, ext = os.path.splitext(os.path.basename(xml_file.name)) + with tempfile.NamedTemporaryFile(prefix=name+'-', + suffix='.xml', + mode='wb+', + delete=False) as tf: + tfn = tf.name + for chunk in xml_file.chunks(): + tf.write(chunk) + + try: + xml_draft = XMLDraft(tfn) + except XMLParseError as e: + msgs = format_messages('xml', e, e.parser_msgs()) + self.add_error('xml', msgs) + return + except Exception as e: + self.add_error('xml', f'Error parsing XML draft: {e}') + return + + self.filename = xml_draft.filename + self.revision = xml_draft.revision + elif self.cleaned_data.get('txt'): + # no XML available, extract from the text if we have it + # n.b., this code path is unused until a subclass with a 'txt' field is created. + txt_file = self.cleaned_data['txt'] + txt_file.seek(0) + bytes = txt_file.read() + try: + text = bytes.decode(self.file_info['txt'].charset) + self.parsed_draft = PlaintextDraft(text, txt_file.name) + self.filename = self.parsed_draft.filename + self.revision = self.parsed_draft.revision + except (UnicodeDecodeError, LookupError) as e: + self.add_error('txt', 'Failed decoding the uploaded file: "%s"' % str(e)) + + rev_error = validate_submission_rev(self.filename, self.revision) + if rev_error: + raise forms.ValidationError(rev_error) + + # The following errors are likely noise if we have previous field + # errors: + if self.errors: + raise forms.ValidationError('') + + if not self.filename: + raise forms.ValidationError("Could not extract a valid draft name from the upload. " + "To fix this in a text upload, please make sure that the full draft name including " + "revision number appears centered on its own line below the document title on the " + "first page. In an xml upload, please make sure that the top-level " + "element has a docName attribute which provides the full draft name including " + "revision number.") + + if not self.revision: + raise forms.ValidationError("Could not extract a valid draft revision from the upload. " + "To fix this in a text upload, please make sure that the full draft name including " + "revision number appears centered on its own line below the document title on the " + "first page. In an xml upload, please make sure that the top-level " + "element has a docName attribute which provides the full draft name including " + "revision number.") + + if self.cleaned_data.get('txt') or self.cleaned_data.get('xml'): + # check group + self.group = self.deduce_group(self.filename) + # check existing + existing = Submission.objects.filter(name=self.filename, rev=self.revision).exclude(state__in=("posted", "cancel", "waiting-for-draft")) + if existing: + raise forms.ValidationError( + format_html( + 'A submission with same name and revision is currently being processed. Check the status here.', + urljoin( + settings.IDTRACKER_BASE_URL, + urlreverse("ietf.submit.views.submission_status", kwargs={'submission_id': existing[0].pk}), + ) + ) + ) + + # cut-off + if self.revision == '00' and self.in_first_cut_off: + raise forms.ValidationError(mark_safe(self.cutoff_warning)) + # check thresholds that depend on the draft / group + self.check_submissions_thresholds( + "for the draft %s" % self.filename, + dict(name=self.filename, rev=self.revision, submission_date=today), + settings.IDSUBMIT_MAX_DAILY_SAME_DRAFT_NAME, settings.IDSUBMIT_MAX_DAILY_SAME_DRAFT_NAME_SIZE, + ) + if self.group: + self.check_submissions_thresholds( + "for the group \"%s\"" % (self.group.acronym), + dict(group=self.group, submission_date=today), + settings.IDSUBMIT_MAX_DAILY_SAME_GROUP, settings.IDSUBMIT_MAX_DAILY_SAME_GROUP_SIZE, + ) + return super().clean() + + @staticmethod + def check_submissions_thresholds(which, filter_kwargs, max_amount, max_size): + submissions = Submission.objects.filter(**filter_kwargs) + + if len(submissions) > max_amount: + raise forms.ValidationError("Max submissions %s has been reached for today (maximum is %s submissions)." % (which, max_amount)) + if sum(s.file_size for s in submissions if s.file_size) > max_size * 1024 * 1024: + raise forms.ValidationError("Max uploaded amount %s has been reached for today (maximum is %s MB)." % (which, max_size)) + + @staticmethod + def deduce_group(name): + """Figure out group from name or previously submitted draft, returns None if individual.""" + existing_draft = Document.objects.filter(name=name, type="draft") + if existing_draft: + group = existing_draft[0].group + if group and group.type_id not in ("individ", "area"): + return group + else: + return None + else: + name_parts = name.split("-") + if len(name_parts) < 3: + raise forms.ValidationError("The draft name \"%s\" is missing a third part, please rename it" % name) + + if name.startswith('draft-ietf-') or name.startswith("draft-irtf-"): + if name_parts[1] == "ietf": + group_type = "wg" + elif name_parts[1] == "irtf": + group_type = "rg" + else: + group_type = None + + # first check groups with dashes + for g in Group.objects.filter(acronym__contains="-", type=group_type): + if name.startswith('draft-%s-%s-' % (name_parts[1], g.acronym)): + return g + + try: + return Group.objects.get(acronym=name_parts[2], type=group_type) + except Group.DoesNotExist: + raise forms.ValidationError('There is no active group with acronym \'%s\', please rename your draft' % name_parts[2]) + + elif name.startswith("draft-rfc-"): + return Group.objects.get(acronym="iesg") + elif name.startswith("draft-rfc-editor-") or name.startswith("draft-rfced-") or name.startswith("draft-rfceditor-"): + return Group.objects.get(acronym="rfceditor") + else: + ntype = name_parts[1].lower() + # This covers group types iesg, iana, iab, ise, and others: + if GroupTypeName.objects.filter(slug=ntype).exists(): + group = Group.objects.filter(acronym=ntype).first() + if group: + return group + else: + raise forms.ValidationError('Draft names starting with draft-%s- are restricted, please pick a differen name' % ntype) + return None + + +class DeprecatedSubmissionBaseUploadForm(SubmissionBaseUploadForm): def clean(self): def format_messages(where, e, log): out = log.write_out.getvalue().splitlines() @@ -353,7 +560,7 @@ class SubmissionBaseUploadForm(forms.Form): if self.cleaned_data.get('txt') or self.cleaned_data.get('xml'): # check group - self.group = self.deduce_group() + self.group = self.deduce_group(self.filename) # check existing existing = Submission.objects.filter(name=self.filename, rev=self.revision).exclude(state__in=("posted", "cancel", "waiting-for-draft")) @@ -367,86 +574,32 @@ class SubmissionBaseUploadForm(forms.Form): # check thresholds today = datetime.date.today() - self.check_submissions_tresholds( + self.check_submissions_thresholds( "for the draft %s" % self.filename, dict(name=self.filename, rev=self.revision, submission_date=today), settings.IDSUBMIT_MAX_DAILY_SAME_DRAFT_NAME, settings.IDSUBMIT_MAX_DAILY_SAME_DRAFT_NAME_SIZE, ) - self.check_submissions_tresholds( + self.check_submissions_thresholds( "for the same submitter", dict(remote_ip=self.remote_ip, submission_date=today), settings.IDSUBMIT_MAX_DAILY_SAME_SUBMITTER, settings.IDSUBMIT_MAX_DAILY_SAME_SUBMITTER_SIZE, ) if self.group: - self.check_submissions_tresholds( + self.check_submissions_thresholds( "for the group \"%s\"" % (self.group.acronym), dict(group=self.group, submission_date=today), settings.IDSUBMIT_MAX_DAILY_SAME_GROUP, settings.IDSUBMIT_MAX_DAILY_SAME_GROUP_SIZE, ) - self.check_submissions_tresholds( + self.check_submissions_thresholds( "across all submitters", dict(submission_date=today), settings.IDSUBMIT_MAX_DAILY_SUBMISSIONS, settings.IDSUBMIT_MAX_DAILY_SUBMISSIONS_SIZE, ) - return super(SubmissionBaseUploadForm, self).clean() + return super().clean() - def check_submissions_tresholds(self, which, filter_kwargs, max_amount, max_size): - submissions = Submission.objects.filter(**filter_kwargs) - if len(submissions) > max_amount: - raise forms.ValidationError("Max submissions %s has been reached for today (maximum is %s submissions)." % (which, max_amount)) - if sum(s.file_size for s in submissions if s.file_size) > max_size * 1024 * 1024: - raise forms.ValidationError("Max uploaded amount %s has been reached for today (maximum is %s MB)." % (which, max_size)) - - def deduce_group(self): - """Figure out group from name or previously submitted draft, returns None if individual.""" - name = self.filename - existing_draft = Document.objects.filter(name=name, type="draft") - if existing_draft: - group = existing_draft[0].group - if group and group.type_id not in ("individ", "area"): - return group - else: - return None - else: - name_parts = name.split("-") - if len(name_parts) < 3: - raise forms.ValidationError("The draft name \"%s\" is missing a third part, please rename it" % name) - - if name.startswith('draft-ietf-') or name.startswith("draft-irtf-"): - - if name_parts[1] == "ietf": - group_type = "wg" - elif name_parts[1] == "irtf": - group_type = "rg" - - # first check groups with dashes - for g in Group.objects.filter(acronym__contains="-", type=group_type): - if name.startswith('draft-%s-%s-' % (name_parts[1], g.acronym)): - return g - - try: - return Group.objects.get(acronym=name_parts[2], type=group_type) - except Group.DoesNotExist: - raise forms.ValidationError('There is no active group with acronym \'%s\', please rename your draft' % name_parts[2]) - - elif name.startswith("draft-rfc-"): - return Group.objects.get(acronym="iesg") - elif name.startswith("draft-rfc-editor-") or name.startswith("draft-rfced-") or name.startswith("draft-rfceditor-"): - return Group.objects.get(acronym="rfceditor") - else: - ntype = name_parts[1].lower() - # This covers group types iesg, iana, iab, ise, and others: - if GroupTypeName.objects.filter(slug=ntype).exists(): - group = Group.objects.filter(acronym=ntype).first() - if group: - return group - else: - raise forms.ValidationError('Draft names starting with draft-%s- are restricted, please pick a differen name' % ntype) - return None - -class SubmissionManualUploadForm(SubmissionBaseUploadForm): +class SubmissionManualUploadForm(DeprecatedSubmissionBaseUploadForm): xml = forms.FileField(label='.xml format', required=False) # xml field with required=False instead of True txt = forms.FileField(label='.txt format', required=False) # We won't permit html upload until we can verify that the content @@ -466,14 +619,67 @@ class SubmissionManualUploadForm(SubmissionBaseUploadForm): def clean_pdf(self): return self.clean_file("pdf", PDFParser) -class SubmissionAutoUploadForm(SubmissionBaseUploadForm): +class DeprecatedSubmissionAutoUploadForm(DeprecatedSubmissionBaseUploadForm): + """Full-service upload form, replaced by the asynchronous version""" user = forms.EmailField(required=True) def __init__(self, request, *args, **kwargs): - super(SubmissionAutoUploadForm, self).__init__(request, *args, **kwargs) + super(DeprecatedSubmissionAutoUploadForm, self).__init__(request, *args, **kwargs) self.formats = ['xml', ] self.base_formats = ['xml', ] +class SubmissionAutoUploadForm(SubmissionBaseUploadForm): + user = forms.EmailField(required=True) + replaces = forms.CharField(required=False, max_length=1000, strip=True) + + def __init__(self, request, *args, **kwargs): + super().__init__(request, *args, **kwargs) + self.formats = ['xml', ] + self.base_formats = ['xml', ] + + def clean(self): + super().clean() + + # Clean the replaces field after the rest of the cleaning so we know the name of the + # uploaded draft via self.filename + if self.cleaned_data['replaces']: + names_replaced = [s.strip() for s in self.cleaned_data['replaces'].split(',')] + self.cleaned_data['replaces'] = ','.join(names_replaced) + aliases_replaced = DocAlias.objects.filter(name__in=names_replaced) + if len(names_replaced) != len(aliases_replaced): + known_names = aliases_replaced.values_list('name', flat=True) + unknown_names = [n for n in names_replaced if n not in known_names] + self.add_error( + 'replaces', + forms.ValidationError( + 'Unknown draft name(s): ' + ', '.join(unknown_names) + ), + ) + for alias in aliases_replaced: + if alias.document.name == self.filename: + self.add_error( + 'replaces', + forms.ValidationError("A draft cannot replace itself"), + ) + elif alias.document.type_id != "draft": + self.add_error( + 'replaces', + forms.ValidationError("A draft can only replace another draft"), + ) + elif alias.document.get_state_slug() == "rfc": + self.add_error( + 'replaces', + forms.ValidationError("A draft cannot replace an RFC"), + ) + elif alias.document.get_state_slug('draft-iesg') in ('approved', 'ann', 'rfcqueue'): + self.add_error( + 'replaces', + forms.ValidationError( + alias.name + " is approved by the IESG and cannot be replaced" + ), + ) + + class NameEmailForm(forms.Form): name = forms.CharField(required=True) email = forms.EmailField(label='Email address', required=True) diff --git a/ietf/submit/mail.py b/ietf/submit/mail.py index 35ea726e8..16d1f0973 100644 --- a/ietf/submit/mail.py +++ b/ietf/submit/mail.py @@ -307,9 +307,7 @@ def add_submission_email(request, remote_ip, name, rev, submission_pk, message, create_submission_event(request, submission, desc) - docevent_from_submission(request, - submission, - desc) + docevent_from_submission(submission, desc) except Exception as e: log("Exception: %s\n" % e) raise diff --git a/ietf/submit/migrations/0009_auto_20220427_1223.py b/ietf/submit/migrations/0009_auto_20220427_1223.py new file mode 100644 index 000000000..6f8e21425 --- /dev/null +++ b/ietf/submit/migrations/0009_auto_20220427_1223.py @@ -0,0 +1,17 @@ +# Generated by Django 2.2.28 on 2022-04-27 12:23 + +from django.db import migrations, models + + +class Migration(migrations.Migration): + + dependencies = [ + ('submit', '0008_submissionextresource'), + ] + + operations = [ + migrations.AddIndex( + model_name='submission', + index=models.Index(fields=['submission_date'], name='submit_subm_submiss_8e58a9_idx'), + ), + ] diff --git a/ietf/submit/models.py b/ietf/submit/models.py index 6043d4ab7..9f0e2fa33 100644 --- a/ietf/submit/models.py +++ b/ietf/submit/models.py @@ -63,6 +63,11 @@ class Submission(models.Model): def __str__(self): return "%s-%s" % (self.name, self.rev) + class Meta: + indexes = [ + models.Index(fields=['submission_date']), + ] + def submitter_parsed(self): return parse_email_line(self.submitter) diff --git a/ietf/submit/tasks.py b/ietf/submit/tasks.py new file mode 100644 index 000000000..21d4275b7 --- /dev/null +++ b/ietf/submit/tasks.py @@ -0,0 +1,45 @@ +# Copyright The IETF Trust 2022, All Rights Reserved +# +# Celery task definitions +# +from celery import shared_task + +from django.db.models import Min +from django.conf import settings +from django.utils import timezone + +from ietf.submit.models import Submission +from ietf.submit.utils import cancel_submission, create_submission_event, process_uploaded_submission +from ietf.utils import log + + +@shared_task +def process_uploaded_submission_task(submission_id): + try: + submission = Submission.objects.get(pk=submission_id) + except Submission.DoesNotExist: + log.log(f'process_uploaded_submission_task called for missing submission_id={submission_id}') + else: + process_uploaded_submission(submission) + + +@shared_task +def cancel_stale_submissions(): + now = timezone.now() + stale_submissions = Submission.objects.filter( + state_id='validating', + ).annotate( + submitted_at=Min('submissionevent__time'), + ).filter( + submitted_at__lt=now - settings.IDSUBMIT_MAX_VALIDATION_TIME, + ) + for subm in stale_submissions: + age = now - subm.submitted_at + log.log(f'Canceling stale submission (id={subm.id}, age={age})') + cancel_submission(subm) + create_submission_event(None, subm, 'Submission canceled: validation checks took too long') + + +@shared_task(bind=True) +def poke(self): + log.log(f'Poked {self.name}, request id {self.request.id}') diff --git a/ietf/submit/tests.py b/ietf/submit/tests.py index 827472c9b..70baff432 100644 --- a/ietf/submit/tests.py +++ b/ietf/submit/tests.py @@ -16,16 +16,21 @@ from pyquery import PyQuery from pathlib import Path from django.conf import settings +from django.core.files.uploadedfile import SimpleUploadedFile +from django.db import transaction +from django.forms import ValidationError from django.test import override_settings from django.test.client import RequestFactory from django.urls import reverse as urlreverse +from django.utils import timezone from django.utils.encoding import force_str, force_text - import debug # pyflakes:ignore from ietf.submit.utils import (expirable_submissions, expire_submission, find_submission_filenames, - post_submission, validate_submission_name, validate_submission_rev) -from ietf.doc.factories import DocumentFactory, WgDraftFactory, IndividualDraftFactory, IndividualRfcFactory + post_submission, validate_submission_name, validate_submission_rev, + process_uploaded_submission, SubmissionError, process_submission_text) +from ietf.doc.factories import (DocumentFactory, WgDraftFactory, IndividualDraftFactory, IndividualRfcFactory, + ReviewFactory, WgRfcFactory) from ietf.doc.models import ( Document, DocAlias, DocEvent, State, BallotPositionDocEvent, DocumentAuthor, SubmissionDocEvent ) from ietf.doc.utils import create_ballot_if_not_open, can_edit_docextresources, update_action_holders @@ -39,8 +44,10 @@ from ietf.name.models import FormalLanguageName from ietf.person.models import Person from ietf.person.factories import UserFactory, PersonFactory, EmailFactory from ietf.submit.factories import SubmissionFactory, SubmissionExtResourceFactory +from ietf.submit.forms import SubmissionBaseUploadForm, SubmissionAutoUploadForm from ietf.submit.models import Submission, Preapproval, SubmissionExtResource from ietf.submit.mail import add_submission_email, process_response_email +from ietf.submit.tasks import cancel_stale_submissions, process_uploaded_submission_task from ietf.utils.accesstoken import generate_access_token from ietf.utils.mail import outbox, empty_outbox, get_payload_text from ietf.utils.models import VersionInfo @@ -84,6 +91,7 @@ class BaseSubmitTestCase(TestCase): return settings.INTERNET_DRAFT_ARCHIVE_DIR def submission_file(name_in_doc, name_in_post, group, templatename, author=None, email=None, title=None, year=None, ascii=True): + _today = datetime.date.today() # construct appropriate text draft f = io.open(os.path.join(settings.BASE_DIR, "submit", templatename)) template = f.read() @@ -96,14 +104,14 @@ def submission_file(name_in_doc, name_in_post, group, templatename, author=None, if title is None: title = "Test Document" if year is None: - year = datetime.date.today().strftime("%Y") + year = _today.strftime("%Y") submission_text = template % dict( - date=datetime.date.today().strftime("%d %B %Y"), - expiration=(datetime.date.today() + datetime.timedelta(days=100)).strftime("%d %B, %Y"), + date=_today.strftime("%d %B %Y"), + expiration=(_today + datetime.timedelta(days=100)).strftime("%d %B, %Y"), year=year, - month=datetime.date.today().strftime("%B"), - day=datetime.date.today().strftime("%d"), + month=_today.strftime("%B"), + day=_today.strftime("%d"), name=name_in_doc, group=group or "", author=author.ascii if ascii else author.name, @@ -2722,6 +2730,660 @@ Subject: test return r + +# Transaction.on_commit() requires use of TransactionTestCase, but that has a performance penalty. Replace it +# with a no-op for testing purposes. +@mock.patch.object(transaction, 'on_commit', lambda x: x()) +@override_settings(IDTRACKER_BASE_URL='https://datatracker.example.com') +class ApiSubmissionTests(BaseSubmitTestCase): + def test_upload_draft(self): + """api_submission accepts a submission and queues it for processing""" + url = urlreverse('ietf.submit.views.api_submission') + xml, author = submission_file('draft-somebody-test-00', 'draft-somebody-test-00.xml', None, 'test_submission.xml') + data = { + 'xml': xml, + 'user': author.user.username, + } + with mock.patch('ietf.submit.views.process_uploaded_submission_task') as mock_task: + r = self.client.post(url, data) + self.assertEqual(r.status_code, 200) + response = r.json() + self.assertCountEqual( + response.keys(), + ['id', 'name', 'rev', 'status_url'], + ) + submission_id = int(response['id']) + self.assertEqual(response['name'], 'draft-somebody-test') + self.assertEqual(response['rev'], '00') + self.assertEqual( + response['status_url'], + 'https://datatracker.example.com' + urlreverse( + 'ietf.submit.views.api_submission_status', + kwargs={'submission_id': submission_id}, + ), + ) + self.assertEqual(mock_task.delay.call_count, 1) + self.assertEqual(mock_task.delay.call_args.args, (submission_id,)) + submission = Submission.objects.get(pk=submission_id) + self.assertEqual(submission.name, 'draft-somebody-test') + self.assertEqual(submission.rev, '00') + self.assertEqual(submission.submitter, author.formatted_email()) + self.assertEqual(submission.state_id, 'validating') + self.assertIn('Uploaded submission through API', submission.submissionevent_set.last().desc) + + def test_upload_draft_with_replaces(self): + """api_submission accepts a submission and queues it for processing""" + existing_draft = WgDraftFactory() + url = urlreverse('ietf.submit.views.api_submission') + xml, author = submission_file('draft-somebody-test-00', 'draft-somebody-test-00.xml', None, 'test_submission.xml') + data = { + 'xml': xml, + 'user': author.user.username, + 'replaces': existing_draft.name, + } + # mock out the task so we don't call to celery during testing! + with mock.patch('ietf.submit.views.process_uploaded_submission_task'): + r = self.client.post(url, data) + self.assertEqual(r.status_code, 200) + submission = Submission.objects.last() + self.assertEqual(submission.name, 'draft-somebody-test') + self.assertEqual(submission.replaces, existing_draft.name) + + def test_rejects_broken_upload(self): + """api_submission immediately rejects a submission with serious problems""" + orig_submission_count = Submission.objects.count() + url = urlreverse('ietf.submit.views.api_submission') + + # invalid submitter + xml, author = submission_file('draft-somebody-test-00', 'draft-somebody-test-00.xml', None, 'test_submission.xml') + data = { + 'xml': xml, + 'user': 'i.dont.exist@nowhere.example.com', + } + with mock.patch('ietf.submit.views.process_uploaded_submission_task') as mock_task: + r = self.client.post(url, data) + self.assertEqual(r.status_code, 400) + response = r.json() + self.assertIn('No such user: ', response['error']) + self.assertFalse(mock_task.delay.called) + self.assertEqual(Submission.objects.count(), orig_submission_count) + + # missing name + xml, _ = submission_file('', 'draft-somebody-test-00.xml', None, 'test_submission.xml', author=author) + data = { + 'xml': xml, + 'user': author.user.username, + } + with mock.patch('ietf.submit.views.process_uploaded_submission_task') as mock_task: + r = self.client.post(url, data) + self.assertEqual(r.status_code, 400) + response = r.json() + self.assertEqual(response['error'], 'Validation Error') + self.assertFalse(mock_task.delay.called) + self.assertEqual(Submission.objects.count(), orig_submission_count) + + # missing rev + xml, _ = submission_file('draft-somebody-test', 'draft-somebody-test-00.xml', None, 'test_submission.xml', author=author) + data = { + 'xml': xml, + 'user': author.user.username, + } + with mock.patch('ietf.submit.views.process_uploaded_submission_task') as mock_task: + r = self.client.post(url, data) + self.assertEqual(r.status_code, 400) + response = r.json() + self.assertEqual(response['error'], 'Validation Error') + self.assertFalse(mock_task.delay.called) + self.assertEqual(Submission.objects.count(), orig_submission_count) + + # in-process submission + SubmissionFactory(name='draft-somebody-test', rev='00') # create an already-in-progress submission + orig_submission_count += 1 # keep this up to date + xml, _ = submission_file('draft-somebody-test-00', 'draft-somebody-test-00.xml', None, 'test_submission.xml', author=author) + data = { + 'xml': xml, + 'user': author.user.username, + } + with mock.patch('ietf.submit.views.process_uploaded_submission_task') as mock_task: + r = self.client.post(url, data) + self.assertEqual(r.status_code, 400) + response = r.json() + self.assertEqual(response['error'], 'Validation Error') + self.assertFalse(mock_task.delay.called) + self.assertEqual(Submission.objects.count(), orig_submission_count) + + @override_settings(IDTRACKER_BASE_URL='http://baseurl.example.com') + def test_get_documentation(self): + """A GET to the submission endpoint retrieves documentation""" + r = self.client.get(urlreverse('ietf.submit.views.api_submission')) + self.assertTemplateUsed(r, 'submit/api_submission_info.html') + self.assertContains(r, 'http://baseurl.example.com', status_code=200) + + def test_submission_status(self): + s = SubmissionFactory(state_id='validating') + url = urlreverse('ietf.submit.views.api_submission_status', kwargs={'submission_id': s.pk}) + r = self.client.get(url) + self.assertEqual(r.status_code, 200) + self.assertEqual( + r.json(), + {'id': str(s.pk), 'state': 'validating'}, + ) + + s.state_id = 'uploaded' + s.save() + r = self.client.get(url) + self.assertEqual(r.status_code, 200) + self.assertEqual( + r.json(), + {'id': str(s.pk), 'state': 'uploaded'}, + ) + + # try an invalid one + r = self.client.get(urlreverse('ietf.submit.views.api_submission_status', kwargs={'submission_id': '999999'})) + self.assertEqual(r.status_code, 404) + + +class SubmissionUploadFormTests(BaseSubmitTestCase): + def test_check_submission_thresholds(self): + today = datetime.date.today() + yesterday = today - datetime.timedelta(days=1) + (this_group, that_group) = GroupFactory.create_batch(2, type_id='wg') + this_ip = '10.0.0.1' + that_ip = '192.168.42.42' + one_mb = 1024 * 1024 + this_draft = 'draft-this-draft' + that_draft = 'draft-different-draft' + SubmissionFactory(group=this_group, name=this_draft, rev='00', submission_date=yesterday, remote_ip=this_ip, file_size=one_mb) + SubmissionFactory(group=this_group, name=that_draft, rev='00', submission_date=yesterday, remote_ip=this_ip, file_size=one_mb) + SubmissionFactory(group=this_group, name=this_draft, rev='00', submission_date=today, remote_ip=this_ip, file_size=one_mb) + SubmissionFactory(group=this_group, name=that_draft, rev='00', submission_date=today, remote_ip=this_ip, file_size=one_mb) + SubmissionFactory(group=that_group, name=this_draft, rev='00', submission_date=yesterday, remote_ip=that_ip, file_size=one_mb) + SubmissionFactory(group=that_group, name=that_draft, rev='00', submission_date=yesterday, remote_ip=that_ip, file_size=one_mb) + SubmissionFactory(group=that_group, name=this_draft, rev='00', submission_date=today, remote_ip=that_ip, file_size=one_mb) + SubmissionFactory(group=that_group, name=that_draft, rev='00', submission_date=today, remote_ip=that_ip, file_size=one_mb) + SubmissionFactory(group=that_group, name=that_draft, rev='01', submission_date=today, remote_ip=that_ip, file_size=one_mb) + + # Tests aim to cover the permutations of DB filters that are used by the clean() method + # - all IP addresses, today + SubmissionBaseUploadForm.check_submissions_thresholds( + 'valid today, all submitters', + dict(submission_date=today), + max_amount=5, + max_size=5, # megabytes + ) + with self.assertRaisesMessage(ValidationError, 'Max submissions'): + SubmissionBaseUploadForm.check_submissions_thresholds( + 'too many today, all submitters', + dict(submission_date=today), + max_amount=4, + max_size=5, # megabytes + ) + with self.assertRaisesMessage(ValidationError, 'Max uploaded amount'): + SubmissionBaseUploadForm.check_submissions_thresholds( + 'too much today, all submitters', + dict(submission_date=today), + max_amount=5, + max_size=4, # megabytes + ) + + # - one IP address, today + SubmissionBaseUploadForm.check_submissions_thresholds( + 'valid today, one submitter', + dict(remote_ip=this_ip, submission_date=today), + max_amount=2, + max_size=2, # megabytes + ) + with self.assertRaisesMessage(ValidationError, 'Max submissions'): + SubmissionBaseUploadForm.check_submissions_thresholds( + 'too many today, one submitter', + dict(remote_ip=this_ip, submission_date=today), + max_amount=1, + max_size=2, # megabytes + ) + with self.assertRaisesMessage(ValidationError, 'Max uploaded amount'): + SubmissionBaseUploadForm.check_submissions_thresholds( + 'too much today, one submitter', + dict(remote_ip=this_ip, submission_date=today), + max_amount=2, + max_size=1, # megabytes + ) + + # - single draft/rev, today + SubmissionBaseUploadForm.check_submissions_thresholds( + 'valid today, one draft', + dict(name=this_draft, rev='00', submission_date=today), + max_amount=2, + max_size=2, # megabytes + ) + with self.assertRaisesMessage(ValidationError, 'Max submissions'): + SubmissionBaseUploadForm.check_submissions_thresholds( + 'too many today, one draft', + dict(name=this_draft, rev='00', submission_date=today), + max_amount=1, + max_size=2, # megabytes + ) + with self.assertRaisesMessage(ValidationError, 'Max uploaded amount'): + SubmissionBaseUploadForm.check_submissions_thresholds( + 'too much today, one draft', + dict(name=this_draft, rev='00', submission_date=today), + max_amount=2, + max_size=1, # megabytes + ) + + # - one group, today + SubmissionBaseUploadForm.check_submissions_thresholds( + 'valid today, one group', + dict(group=this_group, submission_date=today), + max_amount=2, + max_size=2, # megabytes + ) + with self.assertRaisesMessage(ValidationError, 'Max submissions'): + SubmissionBaseUploadForm.check_submissions_thresholds( + 'too many today, one group', + dict(group=this_group, submission_date=today), + max_amount=1, + max_size=2, # megabytes + ) + with self.assertRaisesMessage(ValidationError, 'Max uploaded amount'): + SubmissionBaseUploadForm.check_submissions_thresholds( + 'too much today, one group', + dict(group=this_group, submission_date=today), + max_amount=2, + max_size=1, # megabytes + ) + + def test_replaces_field(self): + """test SubmissionAutoUploadForm replaces field""" + request_factory = RequestFactory() + WgDraftFactory(name='draft-somebody-test') + existing_drafts = WgDraftFactory.create_batch(2) + xml, auth = submission_file('draft-somebody-test-01', 'draft-somebody-test-01.xml', None, 'test_submission.xml') + files_dict = { + 'xml': SimpleUploadedFile('draft-somebody-test-01.xml', xml.read().encode('utf8'), + content_type='application/xml'), + } + + # no replaces + form = SubmissionAutoUploadForm( + request_factory.get('/some/url'), + data={'user': auth.user.username, 'replaces': ''}, + files=files_dict, + ) + self.assertTrue(form.is_valid()) + self.assertEqual(form.cleaned_data['replaces'], '') + + # whitespace + form = SubmissionAutoUploadForm( + request_factory.get('/some/url'), + data={'user': auth.user.username, 'replaces': ' '}, + files=files_dict, + ) + self.assertTrue(form.is_valid()) + self.assertEqual(form.cleaned_data['replaces'], '') + + # one replaces + form = SubmissionAutoUploadForm( + request_factory.get('/some/url'), + data={'user': auth.user.username, 'replaces': existing_drafts[0].name}, + files=files_dict, + ) + self.assertTrue(form.is_valid()) + self.assertEqual(form.cleaned_data['replaces'], existing_drafts[0].name) + + # two replaces + form = SubmissionAutoUploadForm( + request_factory.get('/some/url'), + data={'user': auth.user.username, 'replaces': f'{existing_drafts[0].name},{existing_drafts[1].name}'}, + files=files_dict, + ) + self.assertTrue(form.is_valid()) + self.assertEqual(form.cleaned_data['replaces'], f'{existing_drafts[0].name},{existing_drafts[1].name}') + + # two replaces, extra whitespace + form = SubmissionAutoUploadForm( + request_factory.get('/some/url'), + data={'user': auth.user.username, 'replaces': f' {existing_drafts[0].name} , {existing_drafts[1].name}'}, + files=files_dict, + ) + self.assertTrue(form.is_valid()) + self.assertEqual(form.cleaned_data['replaces'], f'{existing_drafts[0].name},{existing_drafts[1].name}') + + # can't replace self + form = SubmissionAutoUploadForm( + request_factory.get('/some/url'), + data={'user': auth.user.username, 'replaces': 'draft-somebody-test'}, + files=files_dict, + ) + self.assertFalse(form.is_valid()) + self.assertIn('A draft cannot replace itself', form.errors['replaces']) + + # can't replace non-draft + review = ReviewFactory() + form = SubmissionAutoUploadForm( + request_factory.get('/some/url'), + data={'user': auth.user.username, 'replaces': review.name}, + files=files_dict, + ) + self.assertFalse(form.is_valid()) + self.assertIn('A draft can only replace another draft', form.errors['replaces']) + + # can't replace RFC + rfc = WgRfcFactory() + form = SubmissionAutoUploadForm( + request_factory.get('/some/url'), + data={'user': auth.user.username, 'replaces': rfc.name}, + files=files_dict, + ) + self.assertFalse(form.is_valid()) + self.assertIn('A draft cannot replace an RFC', form.errors['replaces']) + + # can't replace draft approved by iesg + existing_drafts[0].set_state(State.objects.get(type='draft-iesg', slug='approved')) + form = SubmissionAutoUploadForm( + request_factory.get('/some/url'), + data={'user': auth.user.username, 'replaces': existing_drafts[0].name}, + files=files_dict, + ) + self.assertFalse(form.is_valid()) + self.assertIn(f'{existing_drafts[0].name} is approved by the IESG and cannot be replaced', + form.errors['replaces']) + + # unknown draft + form = SubmissionAutoUploadForm( + request_factory.get('/some/url'), + data={'user': auth.user.username, 'replaces': 'fake-name'}, + files=files_dict, + ) + self.assertFalse(form.is_valid()) + + +class AsyncSubmissionTests(BaseSubmitTestCase): + """Tests of async submission-related tasks""" + def test_process_uploaded_submission(self): + """process_uploaded_submission should properly process a submission""" + _today = datetime.date.today() + xml, author = submission_file('draft-somebody-test-00', 'draft-somebody-test-00.xml', None, 'test_submission.xml') + xml_data = xml.read() + xml.close() + + submission = SubmissionFactory( + name='draft-somebody-test', + rev='00', + file_types='.xml', + submitter=author.formatted_email(), + state_id='validating', + ) + xml_path = Path(settings.IDSUBMIT_STAGING_PATH) / 'draft-somebody-test-00.xml' + with xml_path.open('w') as f: + f.write(xml_data) + txt_path = xml_path.with_suffix('.txt') + self.assertFalse(txt_path.exists()) + html_path = xml_path.with_suffix('.html') + self.assertFalse(html_path.exists()) + process_uploaded_submission(submission) + + submission = Submission.objects.get(pk=submission.pk) # refresh + self.assertEqual(submission.state_id, 'auth', 'accepted submission should be in auth state') + self.assertEqual(submission.title, 'Test Document') + self.assertEqual(submission.xml_version, '3') + self.assertEqual(submission.document_date, _today) + self.assertEqual(submission.abstract.strip(), 'This document describes how to test tests.') + # don't worry about testing the details of these, just that they're set + self.assertIsNotNone(submission.pages) + self.assertIsNotNone(submission.words) + self.assertNotEqual(submission.first_two_pages.strip(), '') + # at least test that these were created + self.assertTrue(txt_path.exists()) + self.assertTrue(html_path.exists()) + self.assertEqual(submission.file_size, os.stat(txt_path).st_size) + self.assertIn('Completed submission validation checks', submission.submissionevent_set.last().desc) + + def test_process_uploaded_submission_invalid(self): + """process_uploaded_submission should properly process an invalid submission""" + xml, author = submission_file('draft-somebody-test-00', 'draft-somebody-test-00.xml', None, 'test_submission.xml') + xml_data = xml.read() + xml.close() + txt, _ = submission_file('draft-somebody-test-00', 'draft-somebody-test-00.xml', None, + 'test_submission.txt', author=author) + txt_data = txt.read() + txt.close() + + # submitter is not an author + submitter = PersonFactory() + submission = SubmissionFactory( + name='draft-somebody-test', + rev='00', + file_types='.xml', + submitter=submitter.formatted_email(), + state_id='validating', + ) + xml_path = Path(settings.IDSUBMIT_STAGING_PATH) / 'draft-somebody-test-00.xml' + with xml_path.open('w') as f: + f.write(xml_data) + process_uploaded_submission(submission) + submission = Submission.objects.get(pk=submission.pk) # refresh + self.assertEqual(submission.state_id, 'cancel') + self.assertIn('not one of the document authors', submission.submissionevent_set.last().desc) + + # author has no email address in XML + submission = SubmissionFactory( + name='draft-somebody-test', + rev='00', + file_types='.xml', + submitter=author.formatted_email(), + state_id='validating', + ) + xml_path = Path(settings.IDSUBMIT_STAGING_PATH) / 'draft-somebody-test-00.xml' + with xml_path.open('w') as f: + f.write(re.sub(r'.*', '', xml_data)) + process_uploaded_submission(submission) + submission = Submission.objects.get(pk=submission.pk) # refresh + self.assertEqual(submission.state_id, 'cancel') + self.assertIn('Missing email address', submission.submissionevent_set.last().desc) + + # no title + submission = SubmissionFactory( + name='draft-somebody-test', + rev='00', + file_types='.xml', + submitter=author.formatted_email(), + state_id='validating', + ) + xml_path = Path(settings.IDSUBMIT_STAGING_PATH) / 'draft-somebody-test-00.xml' + with xml_path.open('w') as f: + f.write(re.sub(r'.*', '', xml_data)) + process_uploaded_submission(submission) + submission = Submission.objects.get(pk=submission.pk) # refresh + self.assertEqual(submission.state_id, 'cancel') + self.assertIn('Could not extract a valid title', submission.submissionevent_set.last().desc) + + # draft name mismatch + submission = SubmissionFactory( + name='draft-different-name', + rev='00', + file_types='.xml', + submitter=author.formatted_email(), + state_id='validating', + ) + xml_path = Path(settings.IDSUBMIT_STAGING_PATH) / 'draft-different-name-00.xml' + with xml_path.open('w') as f: + f.write(xml_data) + process_uploaded_submission(submission) + submission = Submission.objects.get(pk=submission.pk) # refresh + self.assertEqual(submission.state_id, 'cancel') + self.assertIn('draft filename disagrees', submission.submissionevent_set.last().desc) + + # rev mismatch + submission = SubmissionFactory( + name='draft-somebody-test', + rev='01', + file_types='.xml', + submitter=author.formatted_email(), + state_id='validating', + ) + xml_path = Path(settings.IDSUBMIT_STAGING_PATH) / 'draft-somebody-test-01.xml' + with xml_path.open('w') as f: + f.write(xml_data) + process_uploaded_submission(submission) + submission = Submission.objects.get(pk=submission.pk) # refresh + self.assertEqual(submission.state_id, 'cancel') + self.assertIn('revision disagrees', submission.submissionevent_set.last().desc) + + # not xml + submission = SubmissionFactory( + name='draft-somebody-test', + rev='00', + file_types='.txt', + submitter=author.formatted_email(), + state_id='validating', + ) + txt_path = Path(settings.IDSUBMIT_STAGING_PATH) / 'draft-somebody-test-00.txt' + with txt_path.open('w') as f: + f.write(txt_data) + process_uploaded_submission(submission) + submission = Submission.objects.get(pk=submission.pk) # refresh + self.assertEqual(submission.state_id, 'cancel') + self.assertIn('Only XML draft submissions', submission.submissionevent_set.last().desc) + + # wrong state + submission = SubmissionFactory( + name='draft-somebody-test', + rev='00', + file_types='.xml', + submitter=author.formatted_email(), + state_id='uploaded', + ) + xml_path = Path(settings.IDSUBMIT_STAGING_PATH) / 'draft-somebody-test-00.xml' + with xml_path.open('w') as f: + f.write(xml_data) + with mock.patch('ietf.submit.utils.process_submission_xml') as mock_proc_xml: + process_uploaded_submission(submission) + submission = Submission.objects.get(pk=submission.pk) # refresh + self.assertFalse(mock_proc_xml.called, 'Should not process submission not in "validating" state') + self.assertEqual(submission.state_id, 'uploaded', 'State should not be changed') + + # failed checker + submission = SubmissionFactory( + name='draft-somebody-test', + rev='00', + file_types='.xml', + submitter=author.formatted_email(), + state_id='validating', + ) + xml_path = Path(settings.IDSUBMIT_STAGING_PATH) / 'draft-somebody-test-00.xml' + with xml_path.open('w') as f: + f.write(xml_data) + with mock.patch( + 'ietf.submit.utils.apply_checkers', + side_effect = lambda _, __: submission.checks.create( + checker='faked', + passed=False, + message='fake failure', + errors=1, + warnings=0, + items={}, + symbol='x', + ) + ): + process_uploaded_submission(submission) + submission = Submission.objects.get(pk=submission.pk) # refresh + self.assertEqual(submission.state_id, 'cancel') + self.assertIn('fake failure', submission.submissionevent_set.last().desc) + + + @mock.patch('ietf.submit.tasks.process_uploaded_submission') + def test_process_uploaded_submission_task(self, mock_method): + """process_uploaded_submission_task task should properly call its method""" + s = SubmissionFactory() + process_uploaded_submission_task(s.pk) + self.assertEqual(mock_method.call_count, 1) + self.assertEqual(mock_method.call_args.args, (s,)) + + @mock.patch('ietf.submit.tasks.process_uploaded_submission') + def test_process_uploaded_submission_task_ignores_invalid_id(self, mock_method): + """process_uploaded_submission_task should ignore an invalid submission_id""" + SubmissionFactory() # be sure there is a Submission + bad_pk = 9876 + self.assertEqual(Submission.objects.filter(pk=bad_pk).count(), 0) + process_uploaded_submission_task(bad_pk) + self.assertEqual(mock_method.call_count, 0) + + def test_process_submission_text_consistency_checks(self): + """process_submission_text should check draft metadata against submission""" + submission = SubmissionFactory( + name='draft-somebody-test', + rev='00', + title='Correct Draft Title', + ) + txt_path = Path(settings.IDSUBMIT_STAGING_PATH) / 'draft-somebody-test-00.txt' + + # name mismatch + txt, _ = submission_file( + 'draft-somebody-wrong-name-00', # name that appears in the file + 'draft-somebody-test-00.xml', + None, + 'test_submission.txt', + title='Correct Draft Title', + ) + txt_path.open('w').write(txt.read()) + with self.assertRaisesMessage(SubmissionError, 'disagrees with submission filename'): + process_submission_text(submission) + + # rev mismatch + txt, _ = submission_file( + 'draft-somebody-test-01', # name that appears in the file + 'draft-somebody-test-00.xml', + None, + 'test_submission.txt', + title='Correct Draft Title', + ) + txt_path.open('w').write(txt.read()) + with self.assertRaisesMessage(SubmissionError, 'disagrees with submission revision'): + process_submission_text(submission) + + # title mismatch + txt, _ = submission_file( + 'draft-somebody-test-00', # name that appears in the file + 'draft-somebody-test-00.xml', + None, + 'test_submission.txt', + title='Not Correct Draft Title', + ) + txt_path.open('w').write(txt.read()) + with self.assertRaisesMessage(SubmissionError, 'disagrees with submission title'): + process_submission_text(submission) + + def test_status_of_validating_submission(self): + s = SubmissionFactory(state_id='validating') + url = urlreverse('ietf.submit.views.submission_status', kwargs={'submission_id': s.pk}) + r = self.client.get(url) + self.assertContains(r, s.name) + self.assertContains(r, 'still being processed and validated', status_code=200) + + @override_settings(IDSUBMIT_MAX_VALIDATION_TIME=datetime.timedelta(minutes=30)) + def test_cancel_stale_submissions(self): + fresh_submission = SubmissionFactory(state_id='validating') + fresh_submission.submissionevent_set.create( + desc='fake created event', + time=timezone.now() - datetime.timedelta(minutes=15), + ) + stale_submission = SubmissionFactory(state_id='validating') + stale_submission.submissionevent_set.create( + desc='fake created event', + time=timezone.now() - datetime.timedelta(minutes=30, seconds=1), + ) + + cancel_stale_submissions() + + fresh_submission = Submission.objects.get(pk=fresh_submission.pk) + self.assertEqual(fresh_submission.state_id, 'validating') + self.assertEqual(fresh_submission.submissionevent_set.count(), 1) + + stale_submission = Submission.objects.get(pk=stale_submission.pk) + self.assertEqual(stale_submission.state_id, 'cancel') + self.assertEqual(stale_submission.submissionevent_set.count(), 2) + + class ApiSubmitTests(BaseSubmitTestCase): def setUp(self): super().setUp() diff --git a/ietf/submit/urls.py b/ietf/submit/urls.py index 1d99f9407..2309ec55c 100644 --- a/ietf/submit/urls.py +++ b/ietf/submit/urls.py @@ -25,4 +25,7 @@ urlpatterns = [ url(r'^manualpost/email/(?P\d+)/(?P\d+)/(?P[a-f\d]*)/$', views.show_submission_email_message), url(r'^manualpost/replyemail/(?P\d+)/(?P\d+)/$', views.send_submission_email), url(r'^manualpost/sendemail/(?P\d+)/$', views.send_submission_email), + + # proof-of-concept for celery async tasks + url(r'^async-poke/?$', views.async_poke_test), ] \ No newline at end of file diff --git a/ietf/submit/utils.py b/ietf/submit/utils.py index c48e0eb3a..f0a78c5cf 100644 --- a/ietf/submit/utils.py +++ b/ietf/submit/utils.py @@ -1,5 +1,5 @@ -# Copyright The IETF Trust 2011-2020, All Rights Reserved # -*- coding: utf-8 -*- +# Copyright The IETF Trust 2011-2020, All Rights Reserved import datetime @@ -8,8 +8,11 @@ import os import pathlib import re import time +import traceback +import xml2rfc -from typing import Callable, Optional # pyflakes:ignore +from typing import Optional # pyflakes:ignore +from unidecode import unidecode from django.conf import settings from django.core.exceptions import ValidationError @@ -43,7 +46,8 @@ from ietf.utils import log from ietf.utils.accesstoken import generate_random_key from ietf.utils.draft import PlaintextDraft from ietf.utils.mail import is_valid_email -from ietf.utils.text import parse_unicode +from ietf.utils.text import parse_unicode, normalize_text +from ietf.utils.xmldraft import XMLDraft from ietf.person.name import unidecode_name @@ -199,7 +203,7 @@ def check_submission_revision_consistency(submission): return None -def create_submission_event(request, submission, desc): +def create_submission_event(request: Optional[HttpRequest], submission, desc): by = None if request and request.user.is_authenticated: try: @@ -209,8 +213,8 @@ def create_submission_event(request, submission, desc): SubmissionEvent.objects.create(submission=submission, by=by, desc=desc) -def docevent_from_submission(request, submission, desc, who=None): - # type: (HttpRequest, Submission, str, Optional[Person]) -> Optional[DocEvent] +def docevent_from_submission(submission, desc, who=None): + # type: (Submission, str, Optional[Person]) -> Optional[DocEvent] log.assertion('who is None or isinstance(who, Person)') try: @@ -638,7 +642,6 @@ def update_authors(draft, submission): def cancel_submission(submission): submission.state = DraftSubmissionStateName.objects.get(slug="cancel") submission.save() - remove_submission_files(submission) def rename_submission_files(submission, prev_rev, new_rev): @@ -660,11 +663,22 @@ def move_files_to_repository(submission): elif ext in submission.file_types.split(','): raise ValueError("Intended to move '%s' to '%s', but found source and destination missing.") + +def remove_staging_files(name, rev, exts=None): + """Remove staging files corresponding to a submission + + exts is a list of extensions to be removed. If None, defaults to settings.IDSUBMIT_FILE_TYPES. + """ + if exts is None: + exts = [f'.{ext}' for ext in settings.IDSUBMIT_FILE_TYPES] + basename = pathlib.Path(settings.IDSUBMIT_STAGING_PATH) / f'{name}-{rev}' + for ext in exts: + basename.with_suffix(ext).unlink(missing_ok=True) + + def remove_submission_files(submission): - for ext in submission.file_types.split(','): - source = os.path.join(settings.IDSUBMIT_STAGING_PATH, '%s-%s%s' % (submission.name, submission.rev, ext)) - if os.path.exists(source): - os.unlink(source) + remove_staging_files(submission.name, submission.rev, submission.file_types.split(',')) + def approvable_submissions_for_user(user): if not user.is_authenticated: @@ -727,6 +741,12 @@ def expire_submission(submission, by): SubmissionEvent.objects.create(submission=submission, by=by, desc="Cancelled expired submission") + +def clear_existing_files(form): + """Make sure there are no leftover files from a previous submission""" + remove_staging_files(form.filename, form.revision) + + def save_files(form): file_name = {} for ext in list(form.fields.keys()): @@ -810,6 +830,10 @@ def get_draft_meta(form, saved_files): def get_submission(form): + # See if there is a Submission in state waiting-for-draft + # for this revision. + # If so - we're going to update it otherwise we create a new object + submissions = Submission.objects.filter(name=form.filename, rev=form.revision, state_id = "waiting-for-draft").distinct() @@ -823,32 +847,28 @@ def get_submission(form): def fill_in_submission(form, submission, authors, abstract, file_size): - # See if there is a Submission in state waiting-for-draft - # for this revision. - # If so - we're going to update it otherwise we create a new object - submission.state = DraftSubmissionStateName.objects.get(slug="uploaded") submission.remote_ip = form.remote_ip submission.title = form.title submission.abstract = abstract - submission.pages = form.parsed_draft.get_pagecount() - submission.words = form.parsed_draft.get_wordcount() submission.authors = authors - submission.first_two_pages = ''.join(form.parsed_draft.pages[:2]) submission.file_size = file_size submission.file_types = ','.join(form.file_types) submission.xml_version = form.xml_version submission.submission_date = datetime.date.today() - submission.document_date = form.parsed_draft.get_creation_date() submission.replaces = "" - + if form.parsed_draft is not None: + submission.pages = form.parsed_draft.get_pagecount() + submission.words = form.parsed_draft.get_wordcount() + submission.first_two_pages = ''.join(form.parsed_draft.pages[:2]) + submission.document_date = form.parsed_draft.get_creation_date() submission.save() - submission.formal_languages.set(FormalLanguageName.objects.filter(slug__in=form.parsed_draft.get_formal_languages())) + if form.parsed_draft is not None: + submission.formal_languages.set(FormalLanguageName.objects.filter(slug__in=form.parsed_draft.get_formal_languages())) set_extresources_from_existing_draft(submission) -def apply_checkers(submission, file_name): - # run submission checkers +def apply_checker(checker, submission, file_name): def apply_check(submission, checker, method, fn): func = getattr(checker, method) passed, message, errors, warnings, info = func(fn) @@ -856,18 +876,21 @@ def apply_checkers(submission, file_name): message=message, errors=errors, warnings=warnings, items=info, symbol=checker.symbol) check.save() + # ordered list of methods to try + for method in ("check_fragment_xml", "check_file_xml", "check_fragment_txt", "check_file_txt", ): + ext = method[-3:] + if hasattr(checker, method) and ext in file_name: + apply_check(submission, checker, method, file_name[ext]) + break +def apply_checkers(submission, file_name): + # run submission checkers mark = time.time() for checker_path in settings.IDSUBMIT_CHECKER_CLASSES: lap = time.time() checker_class = import_string(checker_path) checker = checker_class() - # ordered list of methods to try - for method in ("check_fragment_xml", "check_file_xml", "check_fragment_txt", "check_file_txt", ): - ext = method[-3:] - if hasattr(checker, method) and ext in file_name: - apply_check(submission, checker, method, file_name[ext]) - break + apply_checker(checker, submission, file_name) tau = time.time() - lap log.log(f"ran {checker.__class__.__name__} ({tau:.3}s) for {file_name}") tau = time.time() - mark @@ -888,7 +911,76 @@ def accept_submission_requires_group_approval(submission): and not Preapproval.objects.filter(name=submission.name).exists() ) -def accept_submission(request, submission, autopost=False): + +class SubmissionError(Exception): + """Exception for errors during submission processing""" + pass + + +def staging_path(filename, revision, ext): + if len(ext) > 0 and ext[0] != '.': + ext = f'.{ext}' + return pathlib.Path(settings.IDSUBMIT_STAGING_PATH) / f'{filename}-{revision}{ext}' + + +def render_missing_formats(submission): + """Generate txt and html formats from xml draft + + If a txt file already exists, leaves it in place. Overwrites an existing html file + if there is one. + """ + xml2rfc.log.write_out = io.StringIO() # open(os.devnull, "w") + xml2rfc.log.write_err = io.StringIO() # open(os.devnull, "w") + xml_path = staging_path(submission.name, submission.rev, '.xml') + parser = xml2rfc.XmlRfcParser(str(xml_path), quiet=True) + # --- Parse the xml --- + xmltree = parser.parse(remove_comments=False) + # If we have v2, run it through v2v3. Keep track of the submitted version, though. + xmlroot = xmltree.getroot() + xml_version = xmlroot.get('version', '2') + if xml_version == '2': + v2v3 = xml2rfc.V2v3XmlWriter(xmltree) + xmltree.tree = v2v3.convert2to3() + + # --- Prep the xml --- + prep = xml2rfc.PrepToolWriter(xmltree, quiet=True, liberal=True, keep_pis=[xml2rfc.V3_PI_TARGET]) + prep.options.accept_prepped = True + xmltree.tree = prep.prep() + if xmltree.tree == None: + raise SubmissionError(f'Error from xml2rfc (prep): {prep.errors}') + + # --- Convert to txt --- + txt_path = staging_path(submission.name, submission.rev, '.txt') + if not txt_path.exists(): + writer = xml2rfc.TextWriter(xmltree, quiet=True) + writer.options.accept_prepped = True + writer.write(txt_path) + log.log( + 'In %s: xml2rfc %s generated %s from %s (version %s)' % ( + str(xml_path.parent), + xml2rfc.__version__, + txt_path.name, + xml_path.name, + xml_version, + ) + ) + + # --- Convert to html --- + html_path = staging_path(submission.name, submission.rev, '.html') + writer = xml2rfc.HtmlWriter(xmltree, quiet=True) + writer.write(str(html_path)) + log.log( + 'In %s: xml2rfc %s generated %s from %s (version %s)' % ( + str(xml_path.parent), + xml2rfc.__version__, + html_path.name, + xml_path.name, + xml_version, + ) + ) + + +def accept_submission(submission: Submission, request: Optional[HttpRequest] = None, autopost=False): """Accept a submission and post or put in correct state to await approvals If autopost is True, will post draft if submitter is authorized to do so. @@ -898,18 +990,15 @@ def accept_submission(request, submission, autopost=False): curr_authors = [ get_person_from_name_email(author["name"], author.get("email")) for author in submission.authors ] # Is the user authenticated as an author who can approve this submission? - user_is_author = ( - request.user.is_authenticated - and request.user.person in (prev_authors if submission.rev != '00' else curr_authors) # type: ignore - ) + requester = None + requester_is_author = False + if request is not None and request.user.is_authenticated: + requester = request.user.person + requester_is_author = requester in (prev_authors if submission.rev != '00' else curr_authors) # If "who" is None, docevent_from_submission will pull it out of submission - docevent_from_submission( - request, - submission, - desc="Uploaded new revision", - who=request.user.person if user_is_author else None, - ) + docevent_from_submission(submission, desc="Uploaded new revision", + who=requester if requester_is_author else None) replaces = DocAlias.objects.filter(name__in=submission.replaces_names) pretty_replaces = '(none)' if not replaces else ( @@ -932,6 +1021,7 @@ def accept_submission(request, submission, autopost=False): # Partial message for submission event sub_event_desc = 'Set submitter to \"%s\", replaces to %s' % (parse_unicode(submission.submitter), pretty_replaces) + create_event = True # Indicates whether still need to create an event docevent_desc = None address_list = [] if requires_ad_approval or requires_prev_ad_approval: @@ -968,11 +1058,11 @@ def accept_submission(request, submission, autopost=False): sent_to = ', '.join(address_list) sub_event_desc += ' and sent approval email to group chairs: %s' % sent_to docevent_desc = "Request for posting approval emailed to group chairs: %s" % sent_to - elif user_is_author and autopost: + elif requester_is_author and autopost: # go directly to posting submission - sub_event_desc = "New version accepted (logged-in submitter: %s)" % request.user.person # type: ignore + sub_event_desc = f'New version accepted (logged-in submitter: {requester})' post_submission(request, submission, sub_event_desc, sub_event_desc) - sub_event_desc = None # do not create submission event below, post_submission() handles it + create_event = False # do not create submission event below, post_submission() handled it else: submission.auth_key = generate_random_key() if requires_prev_authors_approval: @@ -1001,10 +1091,10 @@ def accept_submission(request, submission, autopost=False): sub_event_desc += " and sent confirmation email to submitter and authors: %s" % sent_to docevent_desc = "Request for posting confirmation emailed to submitter and authors: %s" % sent_to - if sub_event_desc: + if create_event: create_submission_event(request, submission, sub_event_desc) if docevent_desc: - docevent_from_submission(request, submission, docevent_desc, who=Person.objects.get(name="(System)")) + docevent_from_submission(submission, docevent_desc, who=Person.objects.get(name="(System)")) return address_list @@ -1035,3 +1125,131 @@ def remote_ip(request): else: remote_ip = request.META.get('REMOTE_ADDR', None) return remote_ip + + +def _normalize_title(title): + if isinstance(title, str): + title = unidecode(title) # replace unicode with best-match ascii + return normalize_text(title) # normalize whitespace + + +def process_submission_xml(submission): + """Validate and extract info from an uploaded submission""" + xml_path = staging_path(submission.name, submission.rev, '.xml') + xml_draft = XMLDraft(xml_path) + + if submission.name != xml_draft.filename: + raise SubmissionError('XML draft filename disagrees with submission filename') + if submission.rev != xml_draft.revision: + raise SubmissionError('XML draft revision disagrees with submission revision') + + authors = xml_draft.get_author_list() + for a in authors: + if not a['email']: + raise SubmissionError(f'Missing email address for author {a}') + + author_emails = [a['email'].lower() for a in authors] + submitter = get_person_from_name_email(**submission.submitter_parsed()) # the ** expands dict into kwargs + if not any( + email.address.lower() in author_emails + for email in submitter.email_set.filter(active=True) + ): + raise SubmissionError(f'Submitter ({submitter}) is not one of the document authors') + + # Fill in the submission data + submission.title = _normalize_title(xml_draft.get_title()) + if not submission.title: + raise SubmissionError('Could not extract a valid title from the XML') + submission.authors = [ + {key: auth[key] for key in ('name', 'email', 'affiliation', 'country')} + for auth in authors + ] + submission.xml_version = xml_draft.xml_version + submission.save() + + +def process_submission_text(submission): + """Validate/extract data from the text version of a submitted draft + + This assumes the draft was uploaded as XML and extracts data that is not + currently available directly from the XML. Additional processing, e.g. from + get_draft_meta(), would need to be added in order to support direct text + draft uploads. + """ + text_path = staging_path(submission.name, submission.rev, '.txt') + text_draft = PlaintextDraft.from_file(text_path) + + if submission.name != text_draft.filename: + raise SubmissionError( + f'Text draft filename ({text_draft.filename}) disagrees with submission filename ({submission.name})' + ) + if submission.rev != text_draft.revision: + raise SubmissionError( + f'Text draft revision ({text_draft.revision}) disagrees with submission revision ({submission.rev})') + text_title = _normalize_title(text_draft.get_title()) + if not text_title: + raise SubmissionError('Could not extract a valid title from the text') + if text_title != submission.title: + raise SubmissionError( + f'Text draft title ({text_title}) disagrees with submission title ({submission.title})') + + submission.abstract = text_draft.get_abstract() + submission.document_date = text_draft.get_creation_date() + submission.pages = text_draft.get_pagecount() + submission.words = text_draft.get_wordcount() + submission.first_two_pages = ''.join(text_draft.pages[:2]) + submission.file_size = os.stat(text_path).st_size + submission.save() + + submission.formal_languages.set( + FormalLanguageName.objects.filter( + slug__in=text_draft.get_formal_languages() + ) + ) + + +def process_uploaded_submission(submission): + def abort_submission(error): + cancel_submission(submission) + create_submission_event(None, submission, f'Submission rejected: {error}') + + if submission.state_id != 'validating': + log.log(f'Submission {submission.pk} is not in "validating" state, skipping.') + return # do nothing + + if submission.file_types != '.xml': + abort_submission('Only XML draft submissions can be processed.') + + try: + process_submission_xml(submission) + if check_submission_revision_consistency(submission): + raise SubmissionError( + 'Document revision inconsistency error in the database. ' + 'Please contact the secretariat for assistance.' + ) + render_missing_formats(submission) + process_submission_text(submission) + set_extresources_from_existing_draft(submission) + apply_checkers( + submission, + { + ext: staging_path(submission.name, submission.rev, ext) + for ext in ['xml', 'txt', 'html'] + } + ) + errors = [c.message for c in submission.checks.filter(passed__isnull=False) if not c.passed] + if len(errors) > 0: + raise SubmissionError('Checks failed: ' + ' / '.join(errors)) + except SubmissionError as err: + abort_submission(err) + except Exception: + log.log(f'Unexpected exception while processing submission {submission.pk}.') + log.log(traceback.format_exc()) + abort_submission('A system error occurred while processing the submission.') + + # if we get here and are still "validating", accept the draft + if submission.state_id == 'validating': + submission.state_id = 'uploaded' + submission.save() + create_submission_event(None, submission, desc="Completed submission validation checks") + accept_submission(submission) diff --git a/ietf/submit/views.py b/ietf/submit/views.py index 9eb303eb2..bab859702 100644 --- a/ietf/submit/views.py +++ b/ietf/submit/views.py @@ -7,14 +7,15 @@ import base64 import datetime from typing import Optional, cast # pyflakes:ignore +from urllib.parse import urljoin from django.conf import settings from django.contrib import messages from django.contrib.auth.models import User -from django.db import DataError +from django.db import DataError, transaction from django.urls import reverse as urlreverse from django.core.exceptions import ValidationError -from django.http import HttpResponseRedirect, Http404, HttpResponseForbidden, HttpResponse +from django.http import HttpResponseRedirect, Http404, HttpResponseForbidden, HttpResponse, JsonResponse from django.http import HttpRequest # pyflakes:ignore from django.shortcuts import get_object_or_404, redirect, render from django.views.decorators.csrf import csrf_exempt @@ -30,14 +31,16 @@ from ietf.mailtrigger.utils import gather_address_lists from ietf.message.models import Message, MessageAttachment from ietf.person.models import Email from ietf.submit.forms import ( SubmissionManualUploadForm, SubmissionAutoUploadForm, AuthorForm, - SubmitterForm, EditSubmissionForm, PreapprovalForm, ReplacesForm, SubmissionEmailForm, MessageModelForm ) + SubmitterForm, EditSubmissionForm, PreapprovalForm, ReplacesForm, SubmissionEmailForm, MessageModelForm, + DeprecatedSubmissionAutoUploadForm ) from ietf.submit.mail import send_full_url, send_manual_post_request, add_submission_email, get_reply_to from ietf.submit.models import (Submission, Preapproval, SubmissionExtResource, DraftSubmissionStateName, SubmissionEmailEvent ) +from ietf.submit.tasks import process_uploaded_submission_task, poke from ietf.submit.utils import ( approvable_submissions_for_user, preapprovals_for_user, recently_approved_by_user, validate_submission, create_submission_event, docevent_from_submission, post_submission, cancel_submission, rename_submission_files, remove_submission_files, get_draft_meta, - get_submission, fill_in_submission, apply_checkers, save_files, + get_submission, fill_in_submission, apply_checkers, save_files, clear_existing_files, check_submission_revision_consistency, accept_submission, accept_submission_requires_group_approval, accept_submission_requires_prev_auth_approval, update_submission_external_resources, remote_ip ) from ietf.stats.utils import clean_country_name @@ -106,6 +109,111 @@ def upload_submission(request): {'selected': 'index', 'form': form}) +@csrf_exempt +def api_submission(request): + def err(code, error, messages=None): + data = {'error': error} + if messages is not None: + data['messages'] = [messages] if isinstance(messages, str) else messages + return JsonResponse(data, status=code) + + if request.method == 'GET': + return render(request, 'submit/api_submission_info.html') + elif request.method == 'POST': + exception = None + submission = None + try: + form = SubmissionAutoUploadForm(request, data=request.POST, files=request.FILES) + if form.is_valid(): + log('got valid submission form for %s' % form.filename) + username = form.cleaned_data['user'] + user = User.objects.filter(username=username) + if user.count() == 0: + # See if a secondary login was being used + email = Email.objects.filter(address=username, active=True) + # The error messages don't talk about 'email', as the field we're + # looking at is still the 'username' field. + if email.count() == 0: + return err(400, "No such user: %s" % username) + elif email.count() > 1: + return err(500, "Multiple matching accounts for %s" % username) + email = email.first() + if not hasattr(email, 'person'): + return err(400, "No person matches %s" % username) + person = email.person + if not hasattr(person, 'user'): + return err(400, "No user matches: %s" % username) + user = person.user + elif user.count() > 1: + return err(500, "Multiple matching accounts for %s" % username) + else: + user = user.first() + if not hasattr(user, 'person'): + return err(400, "No person with username %s" % username) + + # There is a race condition here: creating the Submission with the name/rev + # of this draft is meant to prevent another submission from occurring. However, + # if two submissions occur at the same time, both may decide that they are the + # only submission in progress. This may result in a Submission being posted with + # the wrong files. The window for this is short, though, so it's probably + # tolerable risk. + submission = get_submission(form) + submission.state = DraftSubmissionStateName.objects.get(slug="validating") + submission.remote_ip = form.remote_ip + submission.file_types = ','.join(form.file_types) + submission.submission_date = datetime.date.today() + submission.submitter = user.person.formatted_email() + submission.replaces = form.cleaned_data['replaces'] + submission.save() + clear_existing_files(form) + save_files(form) + create_submission_event(request, submission, desc="Uploaded submission through API") + + # Wrap in on_commit so the delayed task cannot start until the view is done with the DB + transaction.on_commit( + lambda: process_uploaded_submission_task.delay(submission.pk) + ) + return JsonResponse( + { + 'id': str(submission.pk), + 'name': submission.name, + 'rev': submission.rev, + 'status_url': urljoin( + settings.IDTRACKER_BASE_URL, + urlreverse(api_submission_status, kwargs={'submission_id': submission.pk}), + ), + } + ) + else: + raise ValidationError(form.errors) + except IOError as e: + exception = e + return err(500, 'IO Error', str(e)) + except ValidationError as e: + exception = e + return err(400, 'Validation Error', e.messages) + except Exception as e: + exception = e + raise + finally: + if exception and submission: + remove_submission_files(submission) + submission.delete() + else: + return err(405, "Method not allowed") + + +@csrf_exempt +def api_submission_status(request, submission_id): + submission = get_submission_or_404(submission_id) + return JsonResponse( + { + 'id': str(submission.pk), + 'state': submission.state.slug, + } + ) + + @csrf_exempt def api_submit(request): "Automated submission entrypoint" @@ -118,7 +226,7 @@ def api_submit(request): elif request.method == 'POST': exception = None try: - form = SubmissionAutoUploadForm(request, data=request.POST, files=request.FILES) + form = DeprecatedSubmissionAutoUploadForm(request, data=request.POST, files=request.FILES) if form.is_valid(): log('got valid submission form for %s' % form.filename) username = form.cleaned_data['user'] @@ -175,7 +283,7 @@ def api_submit(request): raise ValidationError('Submitter %s is not one of the document authors' % user.username) submission.submitter = user.person.formatted_email() - sent_to = accept_submission(request, submission) + sent_to = accept_submission(submission, request) return HttpResponse( "Upload of %s OK, confirmation requests sent to:\n %s" % (submission.name, ',\n '.join(sent_to)), @@ -244,7 +352,11 @@ def submission_status(request, submission_id, access_token=None): is_ad = area and area.has_role(request.user, "ad") can_edit = can_edit_submission(request.user, submission, access_token) and submission.state_id == "uploaded" - can_cancel = (key_matched or is_secretariat) and submission.state.next_states.filter(slug="cancel") + # disallow cancellation of 'validating' submissions except by secretariat until async process is safely abortable + can_cancel = ( + (is_secretariat or (key_matched and submission.state_id != 'validating')) + and submission.state.next_states.filter(slug="cancel") + ) can_group_approve = (is_secretariat or is_ad or is_chair) and submission.state_id == "grp-appr" can_ad_approve = (is_secretariat or is_ad) and submission.state_id == "ad-appr" @@ -365,13 +477,13 @@ def submission_status(request, submission_id, access_token=None): permission_denied(request, 'You do not have permission to perform this action') # go directly to posting submission - docevent_from_submission(request, submission, desc="Uploaded new revision") + docevent_from_submission(submission, desc="Uploaded new revision") desc = "Secretariat manually posting. Approvals already received" post_submission(request, submission, desc, desc) else: - accept_submission(request, submission, autopost=True) + accept_submission(submission, request, autopost=True) if access_token: return redirect("ietf.submit.views.submission_status", submission_id=submission.pk, access_token=access_token) @@ -698,9 +810,7 @@ def cancel_waiting_for_draft(request): create_submission_event(request, submission, "Cancelled submission") if (submission.rev != "00"): # Add a doc event - docevent_from_submission(request, - submission, - "Cancelled submission for rev {}".format(submission.rev)) + docevent_from_submission(submission, "Cancelled submission for rev {}".format(submission.rev)) return redirect("ietf.submit.views.manualpost") @@ -923,4 +1033,9 @@ def get_submission_or_404(submission_id, access_token=None): if access_token and not key_matched: raise Http404 - return submission \ No newline at end of file + return submission + + +def async_poke_test(request): + result = poke.delay() + return HttpResponse(f'Poked {result}', content_type='text/plain') diff --git a/ietf/templates/submit/api_submission_info.html b/ietf/templates/submit/api_submission_info.html new file mode 100644 index 000000000..da2b21d52 --- /dev/null +++ b/ietf/templates/submit/api_submission_info.html @@ -0,0 +1,107 @@ +{% extends "base.html" %} +{# Copyright The IETF Trust 2015-2022, All Rights Reserved #} +{% load origin ietf_filters %} +{% block title %}Draft submission API instructions{% endblock %} +{% block content %} + {% origin %} +

Draft submission API instructions

+

+ A simplified draft submission interface, intended for automation, + is available at {% url 'ietf.submit.views.api_submission' %}. +

+

+ The interface accepts only XML uploads that can be processed on the server, and + requires the user to have a datatracker account. A successful submit still requires + the same email confirmation round-trip as submissions done through the regular + submission tool. +

+

+ This interface does not provide all the options which the regular submission tool does. + Some limitations: +

+
    +
  • Only XML-only uploads are supported, not text or combined.
  • +
  • Document replacement information cannot be supplied.
  • +
  • + The server expects multipart/form-data, supported by curl but not by wget. +
  • +
+

+ It takes the following parameters: +

+
    +
  • + user which is the user login (required) +
  • +
  • + xml, which is the submitted file (required) +
  • +
  • + replaces, a comma-separated list of draft names replaced by this submission (optional) +
  • +
+

+ When a draft is submitted, basic checks are performed immediately and an HTTP response + is sent including an appropriate http result code and JSON data describing the outcome. +

+

+ On success, the JSON data format is +

+
+{
+  "id": "123",
+  "name": "draft-just-submitted",
+  "rev": "00",
+  "status_url": "{% absurl 'ietf.submit.views.api_submission_status' submission_id='123' %}"
+}
+

+ On error, the JSON data format is +

+
+{
+  "error": "Description of the error"
+}
+

+ If the basic checks passed and a successful response is sent, the draft is queued for further + processing. Its status can be monitored by issuing GET requests to the status_url + indicated in the JSON response. This URL will respond with JSON data in the format +

+
+{
+  "id": "123",
+  "state": "validating"
+}
+

+ The state validating indicates that the draft is being or waiting to be processed. + Any other state indicates that the draft completed validation. If the validation failed or if the + draft was canceled after validation, the state will be cancel. +

+

+ Human-readable details of the draft's status and history can be found at + {% absurl 'ietf.submit.views.submission_status' submission_id='123' %} + (replacing 123 with the id for the submission).) +

+

+ Here is an example of submitting a draft and polling its status through the API: +

+
+$ curl -s -F "user=user.name@example.com" -F "xml=@~/draft-user-example.xml" -F "replaces=draft-user-replaced-draft"  {% absurl 'ietf.submit.views.api_submission' %} | jq
+{
+  "id": "126375",
+  "name": "draft-user-example",
+  "rev": "00",
+  "status_url": "{% absurl 'ietf.submit.views.api_submission_status' submission_id='126375' %}"
+}
+
+$ curl -s {% absurl 'ietf.submit.views.api_submission_status' submission_id='126375' %} | jq
+{
+  "id": "126375",
+  "state": "validating"
+}
+
+$ curl -s {% absurl 'ietf.submit.views.api_submission_status' submission_id='126375' %} | jq
+{
+  "id": "126375",
+  "state": "auth"
+}
+{% endblock %} \ No newline at end of file diff --git a/ietf/templates/submit/api_submit_info.html b/ietf/templates/submit/api_submit_info.html index a94f85520..114b97962 100644 --- a/ietf/templates/submit/api_submit_info.html +++ b/ietf/templates/submit/api_submit_info.html @@ -1,13 +1,19 @@ {% extends "base.html" %} -{# Copyright The IETF Trust 2015, All Rights Reserved #} -{% load origin %} +{# Copyright The IETF Trust 2015-2022, All Rights Reserved #} +{% load origin ietf_filters %} {% block title %}Draft submission API instructions{% endblock %} {% block content %} {% origin %}

Draft submission API instructions

+

+ Note: API endpoint described here is known to have a slow response time or to fail + due to timeout for some draft submissions, particularly those with large file sizes. + It is recommended to use the new API endpoint + instead for increased reliability. +

A simplified draft submission interface, intended for automation, - is available at {% url 'ietf.submit.views.api_submit' %}. + is available at {% absurl 'ietf.submit.views.api_submit' %}.

The interface accepts only XML uploads that can be processed on the server, and @@ -44,7 +50,7 @@ Here is an example:

-$ curl -S -F "user=user.name@example.com" -F "xml=@~/draft-user-example.xml" {{ settings.IDTRACKER_BASE_URL }}{% url 'ietf.submit.views.api_submit' %}
+$ curl -S -F "user=user.name@example.com" -F "xml=@~/draft-user-example.xml" {% absurl 'ietf.submit.views.api_submit' %}
 Upload of draft-user-example OK, confirmation requests sent to:
 User Name <user.name@example.com>
{% endblock %} \ No newline at end of file diff --git a/ietf/templates/submit/submission_status.html b/ietf/templates/submit/submission_status.html index b5c47ac75..255bc02bf 100644 --- a/ietf/templates/submit/submission_status.html +++ b/ietf/templates/submit/submission_status.html @@ -35,75 +35,77 @@ Please fix errors in the form below.

{% endif %} -

Submission checks

-

- {% if passes_checks %} - Your draft has been verified to pass the submission checks. - {% else %} - Your draft has NOT been verified to pass the submission checks. - {% endif %} -

- {% if submission.authors|length > 5 %} -

- - This document has more than five authors listed, which is considered excessive - under normal circumstances. If you plan to request publication as an RFC, this - will require additional consideration by the stream manager (for example, the - IESG), and publication may be declined unless sufficient justification is - provided. See - RFC 7322, section 4.1.1 - for details. + {% if submission.state_id != 'validating' %} +

Submission checks

+

+ {% if passes_checks %} + Your draft has been verified to pass the submission checks. + {% else %} + Your draft has NOT been verified to pass the submission checks. + {% endif %}

- {% endif %} - {% for check in submission.latest_checks %} - {% if check.errors %} -

- The {{ check.checker }} returned {{ check.errors }} error{{ check.errors|pluralize }} - and {{ check.warnings }} warning{{ check.warnings|pluralize }}; click the button - below to see details. Please fix those, and resubmit. -

- {% elif check.warnings %} -

- The {{ check.checker }} returned {{ check.warnings }} warning{{ check.warnings|pluralize }}. + {% if submission.authors|length > 5 %} +

+ + This document has more than five authors listed, which is considered excessive + under normal circumstances. If you plan to request publication as an RFC, this + will require additional consideration by the stream manager (for example, the + IESG), and publication may be declined unless sufficient justification is + provided. See + RFC 7322, section 4.1.1 + for details.

{% endif %} - {% endfor %} - {% for check in submission.latest_checks %} - {% if check.passed != None %} - -