From 3705bedfcd40c7b8b1c8dde96879e513b27ce972 Mon Sep 17 00:00:00 2001 From: Jennifer Richards <jennifer@painless-security.com> Date: Mon, 22 Aug 2022 15:29:31 -0300 Subject: [PATCH] feat: Celery support and asynchronous draft submission API (#4037) * ci: add Dockerfile and action to build celery worker image * ci: build celery worker on push to jennifer/celery branch * ci: also build celery worker for main branch * ci: Add comment to celery Dockerfile * chore: first stab at a celery/rabbitmq docker-compose * feat: add celery configuration and test task / endpoint * chore: run mq/celery containers for dev work * chore: point to ghcr.io image for celery worker * refactor: move XML parsing duties into XMLDraft Move some PlaintextDraft methods into the Draft base class and implement for the XMLDraft class. Use xml2rfc code from ietf.submit as a model for the parsing. This leaves some mismatch between the PlaintextDraft and the Draft class spec for the get_author_list() method to be resolved. * feat: add api_upload endpoint and beginnings of async processing This adds an api_upload() that behaves analogously to the api_submit() endpoint. Celery tasks to handle asynchronous processing are added but are not yet functional enough to be useful. * perf: index Submission table on submission_date This substantially speeds up submission rate threshold checks. * feat: remove existing files when accepting a new submission After checking that a submission is not in progress, remove any files in staging that have the same name/rev with any extension. This should guard against stale files confusing the submission process if the usual cleanup fails or is skipped for some reason. * refactor: make clear that deduce_group() uses only the draft name * refactor: extract only draft name/revision in clean() method Minimizing the amount of validation done when accepting a file. The data extraction will be moved to asynchronous processing. * refactor: minimize checks and data extraction in api_upload() view * ci: fix dockerfiles to match sandbox testing * ci: tweak celery container docker-compose settings * refactor: clean up Draft parsing API and usage * remove get_draftname() from Draft api; set filename during init * further XMLDraft work - remember xml_version after parsing - extract filename/revision during init - comment out long broken get_abstract() method * adjust form clean() method to use changed API * feat: flesh out async submission processing First basically working pass! * feat: add state name for submission being validated asynchronously * feat: cancel submissions that async processing can't handle * refactor: simplify/consolidate async tasks and improve error handling * feat: add api_submission_status endpoint * refactor: return JSON from submission api endpoints * refactor: reuse cancel_submission method * refactor: clean up error reporting a bit * feat: guard against cancellation of a submission while validating Not bulletproof but should prevent * feat: indicate that a submission is still being validated * fix: do not delete submission files after creating them * chore: remove debug statement * test: add tests of the api_upload and api_submission_status endpoints * test: add tests and stubs for async side of submission handling * fix: gracefully handle (ignore) invalid IDs in async submit task * test: test process_uploaded_submission method * fix: fix failures of new tests * refactor: fix type checker complaints * test: test submission_status view of submission in "validating" state * fix: fix up migrations * fix: use the streamlined SubmissionBaseUploadForm for api_upload * feat: show submission history event timestamp as mouse-over text * fix: remove 'manual' as next state for 'validating' submission state * refactor: share SubmissionBaseUploadForm code with Deprecated version * fix: validate text submission title, update a couple comments * chore: disable requirements updating when celery dev container starts * feat: log traceback on unexpected error during submission processing * feat: allow secretariat to cancel "validating" submission * feat: indicate time since submission on the status page * perf: check submission rate thresholds earlier when possible No sense parsing details of a draft that is going to be dropped regardless of those details! * fix: create Submission before saving to reduce race condition window * fix: call deduce_group() with filename * refactor: remove code lint * refactor: change the api_upload URL to api/submission * docs: update submission API documentation * test: add tests of api_submission's text draft consistency checks * refactor: rename api_upload to api_submission to agree with new URL * test: test API documentation and submission thresholds * fix: fix a couple api_submission view renames missed in templates * chore: use base image + add arm64 support * ci: try to fix workflow_dispatch for celery worker * ci: another attempt to fix workflow_dispatch * ci: build celery image for submit-async branch * ci: fix typo * ci: publish celery worker to ghcr.io/painless-security * ci: install python requirements in celery image * ci: fix up requirements install on celery image * chore: remove XML_LIBRARY references that crept back in * feat: accept 'replaces' field in api_submission * docs: update api_submission documentation * fix: remove unused import * test: test "replaces" validation for submission API * test: test that "replaces" is set by api_submission * feat: trap TERM to gracefully stop celery container * chore: tweak celery/mq settings * docs: update installation instructions * ci: adjust paths that trigger celery worker image build * ci: fix branches/repo names left over from dev * ci: run manage.py check when initializing celery container Driver here is applying the patches. Starting the celery workers also invokes the check task, but this should cause a clearer failure if something fails. * docs: revise INSTALL instructions * ci: pass filename to pip update in celery container * docs: update INSTALL to include freezing pip versions Will be used to coordinate package versions with the celery container in production. * docs: add explanation of frozen-requirements.txt * ci: build image for sandbox deployment * ci: add additional build trigger path * docs: tweak INSTALL * fix: change INSTALL process to stop datatracker before running migrations * chore: use ietf.settings for manage.py check in celery container * chore: set uid/gid for celery worker * chore: create user/group in celery container if needed * chore: tweak docker compose/init so celery container works in dev * ci: build mq docker image * fix: move rabbitmq.pid to writeable location * fix: clear password when CELERY_PASSWORD is empty Setting to an empty password is really not a good plan! * chore: add shutdown debugging option to celery image * chore: add django-celery-beat package * chore: run "celery beat" in datatracker-celery image * chore: fix docker image name * feat: add task to cancel stale submissions * test: test the cancel_stale_submissions task * chore: make f-string with no interpolation a plain string Co-authored-by: Nicolas Giard <github@ngpixel.com> Co-authored-by: Robert Sparks <rjsparks@nostrum.com> --- .github/workflows/build-celery-worker.yml | 47 ++ .github/workflows/build-mq-broker.yml | 46 ++ dev/INSTALL | 69 +- dev/celery/Dockerfile | 20 + dev/celery/docker-init.sh | 75 ++ dev/mq/Dockerfile | 17 + dev/mq/definitions.json | 30 + dev/mq/ietf-rabbitmq-server.bash | 22 + dev/mq/rabbitmq.conf | 18 + docker-compose.yml | 36 + docker/docker-compose.celery.yml | 51 ++ docker/docker-compose.extend.yml | 4 + docker/rabbitmq.conf | 19 + ietf/__init__.py | 7 + ietf/api/urls.py | 4 + ietf/celeryapp.py | 22 + ietf/name/fixtures/names.json | 14 + ...044_validating_draftsubmissionstatename.py | 40 ++ ietf/settings.py | 11 + ietf/submit/forms.py | 336 +++++++-- ietf/submit/mail.py | 4 +- .../migrations/0009_auto_20220427_1223.py | 17 + ietf/submit/models.py | 5 + ietf/submit/tasks.py | 45 ++ ietf/submit/tests.py | 678 +++++++++++++++++- ietf/submit/urls.py | 3 + ietf/submit/utils.py | 308 ++++++-- ietf/submit/views.py | 141 +++- .../templates/submit/api_submission_info.html | 107 +++ ietf/templates/submit/api_submit_info.html | 14 +- ietf/templates/submit/submission_status.html | 435 +++++------ ietf/utils/draft.py | 18 + ietf/utils/mail.py | 5 +- ietf/utils/text.py | 1 + ietf/utils/xmldraft.py | 103 ++- requirements.txt | 2 + 36 files changed, 2407 insertions(+), 367 deletions(-) create mode 100644 .github/workflows/build-celery-worker.yml create mode 100644 .github/workflows/build-mq-broker.yml create mode 100644 dev/celery/Dockerfile create mode 100755 dev/celery/docker-init.sh create mode 100644 dev/mq/Dockerfile create mode 100644 dev/mq/definitions.json create mode 100755 dev/mq/ietf-rabbitmq-server.bash create mode 100644 dev/mq/rabbitmq.conf create mode 100644 docker/docker-compose.celery.yml create mode 100644 docker/rabbitmq.conf create mode 100644 ietf/celeryapp.py create mode 100644 ietf/name/migrations/0044_validating_draftsubmissionstatename.py create mode 100644 ietf/submit/migrations/0009_auto_20220427_1223.py create mode 100644 ietf/submit/tasks.py create mode 100644 ietf/templates/submit/api_submission_info.html diff --git a/.github/workflows/build-celery-worker.yml b/.github/workflows/build-celery-worker.yml new file mode 100644 index 000000000..93f8c2a43 --- /dev/null +++ b/.github/workflows/build-celery-worker.yml @@ -0,0 +1,47 @@ +name: Build Celery Worker Docker Image + +on: + push: + branches: + - 'main' + - 'jennifer/submit-async' + paths: + - 'requirements.txt' + - 'dev/celery/**' + - '.github/workflows/build-celery-worker.yml' + + workflow_dispatch: + +jobs: + publish: + runs-on: ubuntu-latest + permissions: + contents: read + packages: write + + steps: + - uses: actions/checkout@v2 + + - name: Set up QEMU + uses: docker/setup-qemu-action@v2 + + - name: Set up Docker Buildx + uses: docker/setup-buildx-action@v2 + + - name: Login to GitHub Container Registry + uses: docker/login-action@v2 + with: + registry: ghcr.io + username: ${{ github.actor }} + password: ${{ secrets.GITHUB_TOKEN }} + + - name: Docker Build & Push + uses: docker/build-push-action@v3 + with: + context: . + file: dev/celery/Dockerfile + platforms: linux/amd64,linux/arm64 + push: true +# tags: ghcr.io/ietf-tools/datatracker-celery:latest + tags: ghcr.io/painless-security/datatracker-celery:latest + diff --git a/.github/workflows/build-mq-broker.yml b/.github/workflows/build-mq-broker.yml new file mode 100644 index 000000000..8b77d6dfe --- /dev/null +++ b/.github/workflows/build-mq-broker.yml @@ -0,0 +1,46 @@ +name: Build MQ Broker Docker Image + +on: + push: + branches: + - 'main' + - 'jennifer/submit-async' + paths: + - 'dev/mq/**' + - '.github/workflows/build-mq-worker.yml' + + workflow_dispatch: + +jobs: + publish: + runs-on: ubuntu-latest + permissions: + contents: read + packages: write + + steps: + - uses: actions/checkout@v2 + + - name: Set up QEMU + uses: docker/setup-qemu-action@v2 + + - name: Set up Docker Buildx + uses: docker/setup-buildx-action@v2 + + - name: Login to GitHub Container Registry + uses: docker/login-action@v2 + with: + registry: ghcr.io + username: ${{ github.actor }} + password: ${{ secrets.GITHUB_TOKEN }} + + - name: Docker Build & Push + uses: docker/build-push-action@v3 + with: + context: . + file: dev/mq/Dockerfile + platforms: linux/amd64,linux/arm64 + push: true +# tags: ghcr.io/ietf-tools/datatracker-mq:latest + tags: ghcr.io/painless-security/datatracker-mq:latest + diff --git a/dev/INSTALL b/dev/INSTALL index 132e607f5..f422af0f2 100644 --- a/dev/INSTALL +++ b/dev/INSTALL @@ -29,6 +29,11 @@ General Instructions for Deployment of a New Release python3.9 -mvenv env source env/bin/activate pip install -r requirements.txt + pip freeze > frozen-requirements.txt + + (The pip freeze command records the exact versions of the Python libraries that pip installed. + This is used by the celery docker container to ensure it uses the same library versions as + the datatracker service.) 5. Move static files into place for CDN (/a/www/www6s/lib/dt): @@ -36,31 +41,66 @@ General Instructions for Deployment of a New Release 6. Run system checks (which patches the just installed modules):: - ietf/manage.py check + ietf/manage.py check - 7. Run migrations: + 7. Switch to the docker directory and update images: + cd /a/docker/datatracker-cel + docker image tag ghcr.io/ietf-tools/datatracker-celery:latest datatracker-celery-fallback + docker image tag ghcr.io/ietf-tools/datatracker-mq:latest datatracker-mq-fallback + docker-compose pull + + 8. Stop and remove the async task containers: + Wait for this to finish cleanly. It may take up to about 10 minutes for the 'stop' command to + complete if a long-running task is in progress. + + docker-compose down + + 9. Stop the datatracker + (consider doing this with a second shell at ietfa to avoid the exit and shift back to wwwrun) + + exit + sudo systemctl stop datatracker.socket datatracker.service + sudo su - -s /bin/bash wwwrun + + 10. Return to the release directory and run migrations: + + cd /a/www/ietf-datatracker/${releasenumber} ietf/manage.py migrate Take note if any migrations were executed. - 8. Back out one directory level, then re-point the 'web' symlink:: + 11. Back out one directory level, then re-point the 'web' symlink:: cd .. rm ./web; ln -s ${releasenumber} web - 9. Reload the datatracker service (it is no longer necessary to restart apache) :: + 12. Start the datatracker service (it is no longer necessary to restart apache) :: exit # or CTRL-D, back to root level shell - systemctl restart datatracker + sudo systemctl start datatracker.service datatracker.socket - 10. Verify operation: + 13. Start async task worker and message broker: + + cd /a/docker/datatracker-cel + bash startcommand + + 14. Verify operation: http://datatracker.ietf.org/ - 11. If install failed and there were no migrations at step 7, revert web symlink and repeat the restart in step 9. - If there were migrations at step 7, they will need to be reversed before the restart at step 9. If it's not obvious - what to do to reverse the migrations, contact the dev team. + 15. If install failed and there were no migrations at step 9, revert web symlink and docker update and repeat the + restart in steps 11 and 12. To revert the docker update: + + cd /a/docker/datatracker-cel + docker-compose down + docker image rm ghcr.io/ietf-tools/datatracker-celery:latest ghcr.io/ietf-tools/datatracker-mq:latest + docker image tag datatracker-celery-fallback ghcr.io/ietf-tools/datatracker-celery:latest + docker image tag datatracker-mq-fallback ghcr.io/ietf-tools/datatracker-mq:latest + cd - + + If there were migrations at step 10, they will need to be reversed before the restart at step 12. + If it's not obvious what to do to reverse the migrations, contact the dev team. Patching a Production Release @@ -95,8 +135,17 @@ The following process should be used: 6. Edit ``.../ietf/__init__.py`` in the new patched release to indicate the patch version in the ``__patch__`` string. - 7. Change the 'web' symlink, reload etc. as described in + 7. Stop the async task container (this may take a few minutes if tasks are in progress): + + cd /a/docker/datatracker-cel + docker-compose stop celery + + 8. Change the 'web' symlink, reload etc. as described in `General Instructions for Deployment of a New Release`_. + 9. Start async task worker: + + cd /a/docker/datatracker-cel + bash startcommand diff --git a/dev/celery/Dockerfile b/dev/celery/Dockerfile new file mode 100644 index 000000000..91f2949a6 --- /dev/null +++ b/dev/celery/Dockerfile @@ -0,0 +1,20 @@ +# Dockerfile for celery worker +# +FROM ghcr.io/ietf-tools/datatracker-app-base:latest +LABEL maintainer="IETF Tools Team <tools-discuss@ietf.org>" + +ENV DEBIAN_FRONTEND=noninteractive + +RUN apt-get purge -y imagemagick imagemagick-6-common + +# Copy the startup file +COPY dev/celery/docker-init.sh /docker-init.sh +RUN sed -i 's/\r$//' /docker-init.sh && \ + chmod +x /docker-init.sh + +# Install current datatracker python dependencies +COPY requirements.txt /tmp/pip-tmp/ +RUN pip3 --disable-pip-version-check --no-cache-dir install -r /tmp/pip-tmp/requirements.txt +RUN rm -rf /tmp/pip-tmp + +ENTRYPOINT [ "/docker-init.sh" ] \ No newline at end of file diff --git a/dev/celery/docker-init.sh b/dev/celery/docker-init.sh new file mode 100755 index 000000000..9d00328ad --- /dev/null +++ b/dev/celery/docker-init.sh @@ -0,0 +1,75 @@ +#!/bin/bash +# +# Environment parameters: +# +# CELERY_APP - name of application to pass to celery (defaults to ietf) +# +# CELERY_ROLE - 'worker' or 'beat' (defaults to 'worker') +# +# CELERY_UID - numeric uid for the celery worker process +# +# CELERY_GID - numeric gid for the celery worker process +# +# UPDATES_REQUIREMENTS_FROM - path, relative to /workspace mount, to a pip requirements +# file that should be installed at container startup. Default is no package install/update. +# +# DEBUG_TERM_TIMING - if non-empty, writes debug messages during shutdown after a TERM signal +# +WORKSPACEDIR="/workspace" +CELERY_ROLE="${CELERY_ROLE:-worker}" + +cd "$WORKSPACEDIR" || exit 255 + +if [[ -n "${UPDATE_REQUIREMENTS_FROM}" ]]; then + reqs_file="${WORKSPACEDIR}/${UPDATE_REQUIREMENTS_FROM}" + echo "Updating requirements from ${reqs_file}..." + pip install --upgrade -r "${reqs_file}" +fi + +if [[ "${CELERY_ROLE}" == "worker" ]]; then + echo "Running initial checks..." + /usr/local/bin/python $WORKSPACEDIR/ietf/manage.py check +fi + +CELERY_OPTS=( "${CELERY_ROLE}" ) +if [[ -n "${CELERY_UID}" ]]; then + # ensure that some group with the necessary GID exists in container + if ! id "${CELERY_UID}" ; then + adduser --system --uid "${CELERY_UID}" --no-create-home --disabled-login "celery-user-${CELERY_UID}" + fi + CELERY_OPTS+=("--uid=${CELERY_UID}") +fi + +if [[ -n "${CELERY_GID}" ]]; then + # ensure that some group with the necessary GID exists in container + if ! getent group "${CELERY_GID}" ; then + addgroup --gid "${CELERY_GID}" "celery-group-${CELERY_GID}" + fi + CELERY_OPTS+=("--gid=${CELERY_GID}") +fi + +log_term_timing_msgs () { + # output periodic debug message + while true; do + echo "Waiting for celery worker shutdown ($(date --utc --iso-8601=ns))" + sleep 0.5s + done +} + +cleanup () { + # Cleanly terminate the celery app by sending it a TERM, then waiting for it to exit. + if [[ -n "${celery_pid}" ]]; then + echo "Gracefully terminating celery worker. This may take a few minutes if tasks are in progress..." + kill -TERM "${celery_pid}" + if [[ -n "${DEBUG_TERM_TIMING}" ]]; then + log_term_timing_msgs & + fi + wait "${celery_pid}" + fi +} + +trap 'trap "" TERM; cleanup' TERM +# start celery in the background so we can trap the TERM signal +celery --app="${CELERY_APP:-ietf}" "${CELERY_OPTS[@]}" "$@" & +celery_pid=$! +wait "${celery_pid}" diff --git a/dev/mq/Dockerfile b/dev/mq/Dockerfile new file mode 100644 index 000000000..e8871c30a --- /dev/null +++ b/dev/mq/Dockerfile @@ -0,0 +1,17 @@ +# Dockerfile for RabbitMQ worker +# +FROM rabbitmq:3-alpine +LABEL maintainer="IETF Tools Team <tools-discuss@ietf.org>" + +# Copy the startup file +COPY dev/mq/ietf-rabbitmq-server.bash /ietf-rabbitmq-server.bash +RUN sed -i 's/\r$//' /ietf-rabbitmq-server.bash && \ + chmod +x /ietf-rabbitmq-server.bash + +# Put the rabbitmq.conf in the conf.d so it runs after 10-defaults.conf. +# Can override this for an individual container by mounting additional +# config files in /etc/rabbitmq/conf.d. +COPY dev/mq/rabbitmq.conf /etc/rabbitmq/conf.d/20-ietf-config.conf +COPY dev/mq/definitions.json /definitions.json + +CMD ["/ietf-rabbitmq-server.bash"] diff --git a/dev/mq/definitions.json b/dev/mq/definitions.json new file mode 100644 index 000000000..60e4fdba0 --- /dev/null +++ b/dev/mq/definitions.json @@ -0,0 +1,30 @@ +{ + "permissions": [ + { + "configure": ".*", + "read": ".*", + "user": "datatracker", + "vhost": "dt", + "write": ".*" + } + ], + "users": [ + { + "hashing_algorithm": "rabbit_password_hashing_sha256", + "limits": {}, + "name": "datatracker", + "password_hash": "", + "tags": [] + } + ], + "vhosts": [ + { + "limits": [], + "metadata": { + "description": "", + "tags": [] + }, + "name": "dt" + } + ] +} diff --git a/dev/mq/ietf-rabbitmq-server.bash b/dev/mq/ietf-rabbitmq-server.bash new file mode 100755 index 000000000..56effba17 --- /dev/null +++ b/dev/mq/ietf-rabbitmq-server.bash @@ -0,0 +1,22 @@ +#!/bin/bash -x +# +# Environment parameters: +# +# CELERY_PASSWORD - password for the datatracker celery user +# +export RABBITMQ_PID_FILE=/tmp/rabbitmq.pid + +update_celery_password () { + rabbitmqctl wait "${RABBITMQ_PID_FILE}" --timeout 300 + rabbitmqctl await_startup --timeout 300 + if [[ -n "${CELERY_PASSWORD}" ]]; then + rabbitmqctl change_password datatracker <<END +${CELERY_PASSWORD} +END + else + rabbitmqctl clear_password datatracker + fi +} + +update_celery_password & +exec rabbitmq-server "$@" diff --git a/dev/mq/rabbitmq.conf b/dev/mq/rabbitmq.conf new file mode 100644 index 000000000..ac6473b3e --- /dev/null +++ b/dev/mq/rabbitmq.conf @@ -0,0 +1,18 @@ +# prevent guest from logging in over tcp +loopback_users.guest = true + +# load saved definitions +load_definitions = /definitions.json + +# Ensure that enough disk is available to flush to disk. To do this, need to limit the +# memory available to the container to something reasonable. See +# https://www.rabbitmq.com/production-checklist.html#monitoring-and-resource-usage +# for recommendations. + +# 1-1.5 times the memory available to the container is adequate for disk limit +disk_free_limit.absolute = 6000MB + +# This should be ~40% of the memory available to the container. Use an +# absolute number because relative will be proprtional to the full machine +# memory. +vm_memory_high_watermark.absolute = 1600MB diff --git a/docker-compose.yml b/docker-compose.yml index 13f628bf4..983a0de98 100644 --- a/docker-compose.yml +++ b/docker-compose.yml @@ -16,6 +16,7 @@ services: depends_on: - db + - mq ipc: host @@ -58,6 +59,41 @@ services: # Add "forwardPorts": ["5432"] to **devcontainer.json** to forward PostgreSQL locally. # (Adding the "ports" property to this file will not forward from a Codespace.) + mq: + image: rabbitmq:3-alpine + restart: unless-stopped + + celery: + image: ghcr.io/ietf-tools/datatracker-celery:latest + environment: + CELERY_APP: ietf + CELERY_ROLE: worker + UPDATE_REQUIREMENTS_FROM: requirements.txt + command: + - '--loglevel=INFO' + depends_on: + - db + restart: unless-stopped + stop_grace_period: 1m + volumes: + - .:/workspace + - app-assets:/assets + + beat: + image: ghcr.io/ietf-tools/datatracker-celery:latest + environment: + CELERY_APP: ietf + CELERY_ROLE: beat + UPDATE_REQUIREMENTS_FROM: requirements.txt + command: + - '--loglevel=INFO' + depends_on: + - db + restart: unless-stopped + stop_grace_period: 1m + volumes: + - .:/workspace + volumes: mariadb-data: app-assets: diff --git a/docker/docker-compose.celery.yml b/docker/docker-compose.celery.yml new file mode 100644 index 000000000..dedae2d00 --- /dev/null +++ b/docker/docker-compose.celery.yml @@ -0,0 +1,51 @@ +version: '2.4' +# Use version 2.4 for mem_limit setting. Version 3+ uses deploy.resources.limits.memory +# instead, but that only works for swarm with docker-compose 1.25.1. + +services: + mq: + image: rabbitmq:3-alpine + user: '${RABBITMQ_UID:-499:499}' + hostname: datatracker-mq +# deploy: +# resources: +# limits: +# memory: 1gb # coordinate with settings in rabbitmq.conf +# reservations: +# memory: 512mb + mem_limit: 1gb # coordinate with settings in rabbitmq.conf + ports: + - '${MQ_PORT:-5672}:5672' + volumes: + - ./lib.rabbitmq:/var/lib/rabbitmq + - ./rabbitmq.conf:/etc/rabbitmq/conf.d/90-ietf.conf + - ./definitions.json:/ietf-conf/definitions.json + restart: unless-stopped + logging: + driver: "syslog" + options: + syslog-address: 'unixgram:///dev/log' + tag: 'docker/{{.Name}}' +# syslog-address: "tcp://ietfa.amsl.com:514" + + celery: + image: ghcr.io/ietf-tools/datatracker-celery:latest + environment: + CELERY_APP: ietf + # UPDATE_REQUIREMENTS: 1 # uncomment to update Python requirements on startup + command: + - '--loglevel=INFO' + user: '${CELERY_UID:-499:499}' + volumes: + - '${DATATRACKER_PATH:-..}:/workspace' + - '${MYSQL_SOCKET_PATH:-/run/mysql}:/run/mysql' + depends_on: + - mq + network_mode: 'service:mq' + restart: unless-stopped + logging: + driver: "syslog" + options: + syslog-address: 'unixgram:///dev/log' + tag: 'docker/{{.Name}}' +# syslog-address: "tcp://ietfa.amsl.com:514" diff --git a/docker/docker-compose.extend.yml b/docker/docker-compose.extend.yml index 30402394c..06e47bbb0 100644 --- a/docker/docker-compose.extend.yml +++ b/docker/docker-compose.extend.yml @@ -15,3 +15,7 @@ services: db: ports: - '3306' + celery: + volumes: + - .:/workspace + - app-assets:/assets diff --git a/docker/rabbitmq.conf b/docker/rabbitmq.conf new file mode 100644 index 000000000..8603d0a4c --- /dev/null +++ b/docker/rabbitmq.conf @@ -0,0 +1,19 @@ +# prevent guest from logging in over tcp +loopback_users.guest = true + +# load saved definitions +load_definitions = /ietf-conf/definitions.json + +# Ensure that enough disk is available to flush to disk. To do this, need to limit the +# memory available to the container to something reasonable. See +# https://www.rabbitmq.com/production-checklist.html#monitoring-and-resource-usage +# for recommendations. + +# 1-1.5 times the memory available to the container is adequate for disk limit +disk_free_limit.absolute = 1.5GB + +# This should be ~40% of the memory available to the container. Use an +# absolute number because relative will be proprtional to the full machine +# memory. +vm_memory_high_watermark.absolute = 400MB + diff --git a/ietf/__init__.py b/ietf/__init__.py index 133a5d5ab..2338d0428 100644 --- a/ietf/__init__.py +++ b/ietf/__init__.py @@ -16,3 +16,10 @@ __release_branch__ = '' # set this to ".p1", ".p2", etc. after patching __patch__ = "" + + +# This will make sure the app is always imported when +# Django starts so that shared_task will use this app. +from .celeryapp import app as celery_app + +__all__ = ('celery_app',) diff --git a/ietf/api/urls.py b/ietf/api/urls.py index bc3bfb019..ca84870e2 100644 --- a/ietf/api/urls.py +++ b/ietf/api/urls.py @@ -44,6 +44,10 @@ urlpatterns = [ url(r'^openid/', include('oidc_provider.urls', namespace='oidc_provider')), # Draft submission API url(r'^submit/?$', submit_views.api_submit), + # Draft upload API + url(r'^submission/?$', submit_views.api_submission), + # Draft submission state API + url(r'^submission/(?P<submission_id>[0-9]+)/status/?', submit_views.api_submission_status), # Datatracker version url(r'^version/?$', api_views.version), # Application authentication API key diff --git a/ietf/celeryapp.py b/ietf/celeryapp.py new file mode 100644 index 000000000..cefde3a8d --- /dev/null +++ b/ietf/celeryapp.py @@ -0,0 +1,22 @@ +import os + +from celery import Celery + +# Set the default Django settings module for the 'celery' program +os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'ietf.settings') + +app = Celery('ietf') + +# Using a string here means the worker doesn't have to serialize +# the configuration object to child processes. +# - namespace='CELERY' means all celery-related configuration keys +# should have a `CELERY_` prefix. +app.config_from_object('django.conf:settings', namespace='CELERY') + +# Load task modules from all registered Django apps. +app.autodiscover_tasks() + + +@app.task(bind=True) +def debug_task(self): + print(f'Request: {self.request!r}') diff --git a/ietf/name/fixtures/names.json b/ietf/name/fixtures/names.json index e8509628d..41f7fec62 100644 --- a/ietf/name/fixtures/names.json +++ b/ietf/name/fixtures/names.json @@ -10443,6 +10443,20 @@ "model": "name.draftsubmissionstatename", "pk": "uploaded" }, + { + "fields": { + "desc": "Running validation checks on received submission", + "name": "Validating Submitted Draft", + "next_states": [ + "uploaded", + "cancel" + ], + "order": 10, + "used": true + }, + "model": "name.draftsubmissionstatename", + "pk": "validating" + }, { "fields": { "desc": "", diff --git a/ietf/name/migrations/0044_validating_draftsubmissionstatename.py b/ietf/name/migrations/0044_validating_draftsubmissionstatename.py new file mode 100644 index 000000000..de82bbeef --- /dev/null +++ b/ietf/name/migrations/0044_validating_draftsubmissionstatename.py @@ -0,0 +1,40 @@ +# Generated by Django 2.2.28 on 2022-05-17 11:35 + +from django.db import migrations + + +def forward(apps, schema_editor): + DraftSubmissionStateName = apps.get_model('name', 'DraftSubmissionStateName') + new_state = DraftSubmissionStateName.objects.create( + slug='validating', + name='Validating Submitted Draft', + desc='Running validation checks on received submission', + used=True, + order=1 + DraftSubmissionStateName.objects.order_by('-order').first().order, + ) + new_state.next_states.set( + DraftSubmissionStateName.objects.filter( + slug__in=['cancel', 'uploaded'], + ) + ) + + +def reverse(apps, schema_editor): + Submission = apps.get_model('submit', 'Submission') + # Any submissions in the state we are about to delete would be deleted. + # Remove these manually if you really mean to do this. + assert Submission.objects.filter(state__slug='validating').count() == 0 + DraftSubmissionStateName = apps.get_model('name', 'DraftSubmissionStateName') + DraftSubmissionStateName.objects.filter(slug='validating').delete() + + +class Migration(migrations.Migration): + + dependencies = [ + ('name', '0043_editorial_stream_grouptype'), + ('submit', '0001_initial'), # ensure Submission model exists + ] + + operations = [ + migrations.RunPython(forward, reverse) + ] diff --git a/ietf/settings.py b/ietf/settings.py index 0edf20229..12975f061 100644 --- a/ietf/settings.py +++ b/ietf/settings.py @@ -437,6 +437,7 @@ INSTALLED_APPS = [ 'analytical', 'django_vite', 'django_bootstrap5', + 'django_celery_beat', 'corsheaders', 'django_markup', 'django_password_strength', @@ -843,6 +844,8 @@ IDSUBMIT_CHECKER_CLASSES = ( # "ietf.submit.checkers.DraftYangvalidatorChecker", ) +# Max time to allow for validation before a submission is subject to cancellation +IDSUBMIT_MAX_VALIDATION_TIME = datetime.timedelta(minutes=20) IDSUBMIT_MANUAL_STAGING_DIR = '/tmp/' @@ -1176,6 +1179,14 @@ qvNU+qRWi+YXrITsgn92/gVxX5AoK0n+s5Lx7fpjxkARVi66SF6zTJnX DEFAULT_REQUESTS_TIMEOUT = 20 # seconds +# Celery configuration +CELERY_TIMEZONE = 'UTC' +CELERY_BROKER_URL = 'amqp://mq/' +CELERY_BEAT_SCHEDULER = 'django_celery_beat.schedulers:DatabaseScheduler' +CELERY_BEAT_SYNC_EVERY = 1 # update DB after every event +assert not USE_TZ, 'Drop DJANGO_CELERY_BEAT_TZ_AWARE setting once USE_TZ is True!' +DJANGO_CELERY_BEAT_TZ_AWARE = False + # Meetecho API setup: Uncomment this and provide real credentials to enable # Meetecho conference creation for interim session requests # diff --git a/ietf/submit/forms.py b/ietf/submit/forms.py index f97894594..34f568d2e 100644 --- a/ietf/submit/forms.py +++ b/ietf/submit/forms.py @@ -15,10 +15,11 @@ from contextlib import ExitStack from email.utils import formataddr from unidecode import unidecode +from urllib.parse import urljoin from django import forms from django.conf import settings -from django.utils.html import mark_safe # type:ignore +from django.utils.html import mark_safe, format_html # type:ignore from django.urls import reverse as urlreverse from django.utils.encoding import force_str @@ -28,6 +29,7 @@ from ietf.doc.models import Document from ietf.group.models import Group from ietf.ietfauth.utils import has_role from ietf.doc.fields import SearchableDocAliasesField +from ietf.doc.models import DocAlias from ietf.ipr.mail import utc_from_string from ietf.meeting.models import Meeting from ietf.message.models import Message @@ -40,6 +42,8 @@ from ietf.submit.parsers.xml_parser import XMLParser from ietf.utils import log from ietf.utils.draft import PlaintextDraft from ietf.utils.text import normalize_text +from ietf.utils.xmldraft import XMLDraft, XMLParseError + class SubmissionBaseUploadForm(forms.Form): xml = forms.FileField(label='.xml format', required=True) @@ -129,12 +133,215 @@ class SubmissionBaseUploadForm(forms.Form): self.file_info[field_name] = parser_class(f).critical_parse() if self.file_info[field_name].errors: raise forms.ValidationError(self.file_info[field_name].errors) - return f def clean_xml(self): return self.clean_file("xml", XMLParser) + def clean(self): + def format_messages(where, e, log_msgs): + m = str(e) + if m: + m = [m] + else: + import traceback + typ, val, tb = sys.exc_info() + m = traceback.format_exception(typ, val, tb) + m = [ l.replace('\n ', ':\n ') for l in m ] + msgs = [s for s in (["Error from xml2rfc (%s):" % (where,)] + m + log_msgs) if s] + return msgs + + if self.shutdown and not has_role(self.request.user, "Secretariat"): + raise forms.ValidationError('The submission tool is currently shut down') + + # check general submission rate thresholds before doing any more work + today = datetime.date.today() + self.check_submissions_thresholds( + "for the same submitter", + dict(remote_ip=self.remote_ip, submission_date=today), + settings.IDSUBMIT_MAX_DAILY_SAME_SUBMITTER, settings.IDSUBMIT_MAX_DAILY_SAME_SUBMITTER_SIZE, + ) + self.check_submissions_thresholds( + "across all submitters", + dict(submission_date=today), + settings.IDSUBMIT_MAX_DAILY_SUBMISSIONS, settings.IDSUBMIT_MAX_DAILY_SUBMISSIONS_SIZE, + ) + + for ext in self.formats: + f = self.cleaned_data.get(ext, None) + if not f: + continue + self.file_types.append('.%s' % ext) + if not ('.txt' in self.file_types or '.xml' in self.file_types): + if not self.errors: + raise forms.ValidationError('Unexpected submission file types; found %s, but %s is required' % (', '.join(self.file_types), ' or '.join(self.base_formats))) + + # Determine the draft name and revision. Try XML first. + if self.cleaned_data.get('xml'): + xml_file = self.cleaned_data.get('xml') + tfn = None + with ExitStack() as stack: + @stack.callback + def cleanup(): # called when context exited, even in case of exception + if tfn is not None: + os.unlink(tfn) + + # We need to write the xml file to disk in order to hand it + # over to the xml parser. XXX FIXME: investigate updating + # xml2rfc to be able to work with file handles to in-memory + # files. + name, ext = os.path.splitext(os.path.basename(xml_file.name)) + with tempfile.NamedTemporaryFile(prefix=name+'-', + suffix='.xml', + mode='wb+', + delete=False) as tf: + tfn = tf.name + for chunk in xml_file.chunks(): + tf.write(chunk) + + try: + xml_draft = XMLDraft(tfn) + except XMLParseError as e: + msgs = format_messages('xml', e, e.parser_msgs()) + self.add_error('xml', msgs) + return + except Exception as e: + self.add_error('xml', f'Error parsing XML draft: {e}') + return + + self.filename = xml_draft.filename + self.revision = xml_draft.revision + elif self.cleaned_data.get('txt'): + # no XML available, extract from the text if we have it + # n.b., this code path is unused until a subclass with a 'txt' field is created. + txt_file = self.cleaned_data['txt'] + txt_file.seek(0) + bytes = txt_file.read() + try: + text = bytes.decode(self.file_info['txt'].charset) + self.parsed_draft = PlaintextDraft(text, txt_file.name) + self.filename = self.parsed_draft.filename + self.revision = self.parsed_draft.revision + except (UnicodeDecodeError, LookupError) as e: + self.add_error('txt', 'Failed decoding the uploaded file: "%s"' % str(e)) + + rev_error = validate_submission_rev(self.filename, self.revision) + if rev_error: + raise forms.ValidationError(rev_error) + + # The following errors are likely noise if we have previous field + # errors: + if self.errors: + raise forms.ValidationError('') + + if not self.filename: + raise forms.ValidationError("Could not extract a valid draft name from the upload. " + "To fix this in a text upload, please make sure that the full draft name including " + "revision number appears centered on its own line below the document title on the " + "first page. In an xml upload, please make sure that the top-level <rfc/> " + "element has a docName attribute which provides the full draft name including " + "revision number.") + + if not self.revision: + raise forms.ValidationError("Could not extract a valid draft revision from the upload. " + "To fix this in a text upload, please make sure that the full draft name including " + "revision number appears centered on its own line below the document title on the " + "first page. In an xml upload, please make sure that the top-level <rfc/> " + "element has a docName attribute which provides the full draft name including " + "revision number.") + + if self.cleaned_data.get('txt') or self.cleaned_data.get('xml'): + # check group + self.group = self.deduce_group(self.filename) + # check existing + existing = Submission.objects.filter(name=self.filename, rev=self.revision).exclude(state__in=("posted", "cancel", "waiting-for-draft")) + if existing: + raise forms.ValidationError( + format_html( + 'A submission with same name and revision is currently being processed. <a href="{}">Check the status here.</a>', + urljoin( + settings.IDTRACKER_BASE_URL, + urlreverse("ietf.submit.views.submission_status", kwargs={'submission_id': existing[0].pk}), + ) + ) + ) + + # cut-off + if self.revision == '00' and self.in_first_cut_off: + raise forms.ValidationError(mark_safe(self.cutoff_warning)) + # check thresholds that depend on the draft / group + self.check_submissions_thresholds( + "for the draft %s" % self.filename, + dict(name=self.filename, rev=self.revision, submission_date=today), + settings.IDSUBMIT_MAX_DAILY_SAME_DRAFT_NAME, settings.IDSUBMIT_MAX_DAILY_SAME_DRAFT_NAME_SIZE, + ) + if self.group: + self.check_submissions_thresholds( + "for the group \"%s\"" % (self.group.acronym), + dict(group=self.group, submission_date=today), + settings.IDSUBMIT_MAX_DAILY_SAME_GROUP, settings.IDSUBMIT_MAX_DAILY_SAME_GROUP_SIZE, + ) + return super().clean() + + @staticmethod + def check_submissions_thresholds(which, filter_kwargs, max_amount, max_size): + submissions = Submission.objects.filter(**filter_kwargs) + + if len(submissions) > max_amount: + raise forms.ValidationError("Max submissions %s has been reached for today (maximum is %s submissions)." % (which, max_amount)) + if sum(s.file_size for s in submissions if s.file_size) > max_size * 1024 * 1024: + raise forms.ValidationError("Max uploaded amount %s has been reached for today (maximum is %s MB)." % (which, max_size)) + + @staticmethod + def deduce_group(name): + """Figure out group from name or previously submitted draft, returns None if individual.""" + existing_draft = Document.objects.filter(name=name, type="draft") + if existing_draft: + group = existing_draft[0].group + if group and group.type_id not in ("individ", "area"): + return group + else: + return None + else: + name_parts = name.split("-") + if len(name_parts) < 3: + raise forms.ValidationError("The draft name \"%s\" is missing a third part, please rename it" % name) + + if name.startswith('draft-ietf-') or name.startswith("draft-irtf-"): + if name_parts[1] == "ietf": + group_type = "wg" + elif name_parts[1] == "irtf": + group_type = "rg" + else: + group_type = None + + # first check groups with dashes + for g in Group.objects.filter(acronym__contains="-", type=group_type): + if name.startswith('draft-%s-%s-' % (name_parts[1], g.acronym)): + return g + + try: + return Group.objects.get(acronym=name_parts[2], type=group_type) + except Group.DoesNotExist: + raise forms.ValidationError('There is no active group with acronym \'%s\', please rename your draft' % name_parts[2]) + + elif name.startswith("draft-rfc-"): + return Group.objects.get(acronym="iesg") + elif name.startswith("draft-rfc-editor-") or name.startswith("draft-rfced-") or name.startswith("draft-rfceditor-"): + return Group.objects.get(acronym="rfceditor") + else: + ntype = name_parts[1].lower() + # This covers group types iesg, iana, iab, ise, and others: + if GroupTypeName.objects.filter(slug=ntype).exists(): + group = Group.objects.filter(acronym=ntype).first() + if group: + return group + else: + raise forms.ValidationError('Draft names starting with draft-%s- are restricted, please pick a differen name' % ntype) + return None + + +class DeprecatedSubmissionBaseUploadForm(SubmissionBaseUploadForm): def clean(self): def format_messages(where, e, log): out = log.write_out.getvalue().splitlines() @@ -353,7 +560,7 @@ class SubmissionBaseUploadForm(forms.Form): if self.cleaned_data.get('txt') or self.cleaned_data.get('xml'): # check group - self.group = self.deduce_group() + self.group = self.deduce_group(self.filename) # check existing existing = Submission.objects.filter(name=self.filename, rev=self.revision).exclude(state__in=("posted", "cancel", "waiting-for-draft")) @@ -367,86 +574,32 @@ class SubmissionBaseUploadForm(forms.Form): # check thresholds today = datetime.date.today() - self.check_submissions_tresholds( + self.check_submissions_thresholds( "for the draft %s" % self.filename, dict(name=self.filename, rev=self.revision, submission_date=today), settings.IDSUBMIT_MAX_DAILY_SAME_DRAFT_NAME, settings.IDSUBMIT_MAX_DAILY_SAME_DRAFT_NAME_SIZE, ) - self.check_submissions_tresholds( + self.check_submissions_thresholds( "for the same submitter", dict(remote_ip=self.remote_ip, submission_date=today), settings.IDSUBMIT_MAX_DAILY_SAME_SUBMITTER, settings.IDSUBMIT_MAX_DAILY_SAME_SUBMITTER_SIZE, ) if self.group: - self.check_submissions_tresholds( + self.check_submissions_thresholds( "for the group \"%s\"" % (self.group.acronym), dict(group=self.group, submission_date=today), settings.IDSUBMIT_MAX_DAILY_SAME_GROUP, settings.IDSUBMIT_MAX_DAILY_SAME_GROUP_SIZE, ) - self.check_submissions_tresholds( + self.check_submissions_thresholds( "across all submitters", dict(submission_date=today), settings.IDSUBMIT_MAX_DAILY_SUBMISSIONS, settings.IDSUBMIT_MAX_DAILY_SUBMISSIONS_SIZE, ) - return super(SubmissionBaseUploadForm, self).clean() + return super().clean() - def check_submissions_tresholds(self, which, filter_kwargs, max_amount, max_size): - submissions = Submission.objects.filter(**filter_kwargs) - if len(submissions) > max_amount: - raise forms.ValidationError("Max submissions %s has been reached for today (maximum is %s submissions)." % (which, max_amount)) - if sum(s.file_size for s in submissions if s.file_size) > max_size * 1024 * 1024: - raise forms.ValidationError("Max uploaded amount %s has been reached for today (maximum is %s MB)." % (which, max_size)) - - def deduce_group(self): - """Figure out group from name or previously submitted draft, returns None if individual.""" - name = self.filename - existing_draft = Document.objects.filter(name=name, type="draft") - if existing_draft: - group = existing_draft[0].group - if group and group.type_id not in ("individ", "area"): - return group - else: - return None - else: - name_parts = name.split("-") - if len(name_parts) < 3: - raise forms.ValidationError("The draft name \"%s\" is missing a third part, please rename it" % name) - - if name.startswith('draft-ietf-') or name.startswith("draft-irtf-"): - - if name_parts[1] == "ietf": - group_type = "wg" - elif name_parts[1] == "irtf": - group_type = "rg" - - # first check groups with dashes - for g in Group.objects.filter(acronym__contains="-", type=group_type): - if name.startswith('draft-%s-%s-' % (name_parts[1], g.acronym)): - return g - - try: - return Group.objects.get(acronym=name_parts[2], type=group_type) - except Group.DoesNotExist: - raise forms.ValidationError('There is no active group with acronym \'%s\', please rename your draft' % name_parts[2]) - - elif name.startswith("draft-rfc-"): - return Group.objects.get(acronym="iesg") - elif name.startswith("draft-rfc-editor-") or name.startswith("draft-rfced-") or name.startswith("draft-rfceditor-"): - return Group.objects.get(acronym="rfceditor") - else: - ntype = name_parts[1].lower() - # This covers group types iesg, iana, iab, ise, and others: - if GroupTypeName.objects.filter(slug=ntype).exists(): - group = Group.objects.filter(acronym=ntype).first() - if group: - return group - else: - raise forms.ValidationError('Draft names starting with draft-%s- are restricted, please pick a differen name' % ntype) - return None - -class SubmissionManualUploadForm(SubmissionBaseUploadForm): +class SubmissionManualUploadForm(DeprecatedSubmissionBaseUploadForm): xml = forms.FileField(label='.xml format', required=False) # xml field with required=False instead of True txt = forms.FileField(label='.txt format', required=False) # We won't permit html upload until we can verify that the content @@ -466,14 +619,67 @@ class SubmissionManualUploadForm(SubmissionBaseUploadForm): def clean_pdf(self): return self.clean_file("pdf", PDFParser) -class SubmissionAutoUploadForm(SubmissionBaseUploadForm): +class DeprecatedSubmissionAutoUploadForm(DeprecatedSubmissionBaseUploadForm): + """Full-service upload form, replaced by the asynchronous version""" user = forms.EmailField(required=True) def __init__(self, request, *args, **kwargs): - super(SubmissionAutoUploadForm, self).__init__(request, *args, **kwargs) + super(DeprecatedSubmissionAutoUploadForm, self).__init__(request, *args, **kwargs) self.formats = ['xml', ] self.base_formats = ['xml', ] +class SubmissionAutoUploadForm(SubmissionBaseUploadForm): + user = forms.EmailField(required=True) + replaces = forms.CharField(required=False, max_length=1000, strip=True) + + def __init__(self, request, *args, **kwargs): + super().__init__(request, *args, **kwargs) + self.formats = ['xml', ] + self.base_formats = ['xml', ] + + def clean(self): + super().clean() + + # Clean the replaces field after the rest of the cleaning so we know the name of the + # uploaded draft via self.filename + if self.cleaned_data['replaces']: + names_replaced = [s.strip() for s in self.cleaned_data['replaces'].split(',')] + self.cleaned_data['replaces'] = ','.join(names_replaced) + aliases_replaced = DocAlias.objects.filter(name__in=names_replaced) + if len(names_replaced) != len(aliases_replaced): + known_names = aliases_replaced.values_list('name', flat=True) + unknown_names = [n for n in names_replaced if n not in known_names] + self.add_error( + 'replaces', + forms.ValidationError( + 'Unknown draft name(s): ' + ', '.join(unknown_names) + ), + ) + for alias in aliases_replaced: + if alias.document.name == self.filename: + self.add_error( + 'replaces', + forms.ValidationError("A draft cannot replace itself"), + ) + elif alias.document.type_id != "draft": + self.add_error( + 'replaces', + forms.ValidationError("A draft can only replace another draft"), + ) + elif alias.document.get_state_slug() == "rfc": + self.add_error( + 'replaces', + forms.ValidationError("A draft cannot replace an RFC"), + ) + elif alias.document.get_state_slug('draft-iesg') in ('approved', 'ann', 'rfcqueue'): + self.add_error( + 'replaces', + forms.ValidationError( + alias.name + " is approved by the IESG and cannot be replaced" + ), + ) + + class NameEmailForm(forms.Form): name = forms.CharField(required=True) email = forms.EmailField(label='Email address', required=True) diff --git a/ietf/submit/mail.py b/ietf/submit/mail.py index 35ea726e8..16d1f0973 100644 --- a/ietf/submit/mail.py +++ b/ietf/submit/mail.py @@ -307,9 +307,7 @@ def add_submission_email(request, remote_ip, name, rev, submission_pk, message, create_submission_event(request, submission, desc) - docevent_from_submission(request, - submission, - desc) + docevent_from_submission(submission, desc) except Exception as e: log("Exception: %s\n" % e) raise diff --git a/ietf/submit/migrations/0009_auto_20220427_1223.py b/ietf/submit/migrations/0009_auto_20220427_1223.py new file mode 100644 index 000000000..6f8e21425 --- /dev/null +++ b/ietf/submit/migrations/0009_auto_20220427_1223.py @@ -0,0 +1,17 @@ +# Generated by Django 2.2.28 on 2022-04-27 12:23 + +from django.db import migrations, models + + +class Migration(migrations.Migration): + + dependencies = [ + ('submit', '0008_submissionextresource'), + ] + + operations = [ + migrations.AddIndex( + model_name='submission', + index=models.Index(fields=['submission_date'], name='submit_subm_submiss_8e58a9_idx'), + ), + ] diff --git a/ietf/submit/models.py b/ietf/submit/models.py index 6043d4ab7..9f0e2fa33 100644 --- a/ietf/submit/models.py +++ b/ietf/submit/models.py @@ -63,6 +63,11 @@ class Submission(models.Model): def __str__(self): return "%s-%s" % (self.name, self.rev) + class Meta: + indexes = [ + models.Index(fields=['submission_date']), + ] + def submitter_parsed(self): return parse_email_line(self.submitter) diff --git a/ietf/submit/tasks.py b/ietf/submit/tasks.py new file mode 100644 index 000000000..21d4275b7 --- /dev/null +++ b/ietf/submit/tasks.py @@ -0,0 +1,45 @@ +# Copyright The IETF Trust 2022, All Rights Reserved +# +# Celery task definitions +# +from celery import shared_task + +from django.db.models import Min +from django.conf import settings +from django.utils import timezone + +from ietf.submit.models import Submission +from ietf.submit.utils import cancel_submission, create_submission_event, process_uploaded_submission +from ietf.utils import log + + +@shared_task +def process_uploaded_submission_task(submission_id): + try: + submission = Submission.objects.get(pk=submission_id) + except Submission.DoesNotExist: + log.log(f'process_uploaded_submission_task called for missing submission_id={submission_id}') + else: + process_uploaded_submission(submission) + + +@shared_task +def cancel_stale_submissions(): + now = timezone.now() + stale_submissions = Submission.objects.filter( + state_id='validating', + ).annotate( + submitted_at=Min('submissionevent__time'), + ).filter( + submitted_at__lt=now - settings.IDSUBMIT_MAX_VALIDATION_TIME, + ) + for subm in stale_submissions: + age = now - subm.submitted_at + log.log(f'Canceling stale submission (id={subm.id}, age={age})') + cancel_submission(subm) + create_submission_event(None, subm, 'Submission canceled: validation checks took too long') + + +@shared_task(bind=True) +def poke(self): + log.log(f'Poked {self.name}, request id {self.request.id}') diff --git a/ietf/submit/tests.py b/ietf/submit/tests.py index 827472c9b..70baff432 100644 --- a/ietf/submit/tests.py +++ b/ietf/submit/tests.py @@ -16,16 +16,21 @@ from pyquery import PyQuery from pathlib import Path from django.conf import settings +from django.core.files.uploadedfile import SimpleUploadedFile +from django.db import transaction +from django.forms import ValidationError from django.test import override_settings from django.test.client import RequestFactory from django.urls import reverse as urlreverse +from django.utils import timezone from django.utils.encoding import force_str, force_text - import debug # pyflakes:ignore from ietf.submit.utils import (expirable_submissions, expire_submission, find_submission_filenames, - post_submission, validate_submission_name, validate_submission_rev) -from ietf.doc.factories import DocumentFactory, WgDraftFactory, IndividualDraftFactory, IndividualRfcFactory + post_submission, validate_submission_name, validate_submission_rev, + process_uploaded_submission, SubmissionError, process_submission_text) +from ietf.doc.factories import (DocumentFactory, WgDraftFactory, IndividualDraftFactory, IndividualRfcFactory, + ReviewFactory, WgRfcFactory) from ietf.doc.models import ( Document, DocAlias, DocEvent, State, BallotPositionDocEvent, DocumentAuthor, SubmissionDocEvent ) from ietf.doc.utils import create_ballot_if_not_open, can_edit_docextresources, update_action_holders @@ -39,8 +44,10 @@ from ietf.name.models import FormalLanguageName from ietf.person.models import Person from ietf.person.factories import UserFactory, PersonFactory, EmailFactory from ietf.submit.factories import SubmissionFactory, SubmissionExtResourceFactory +from ietf.submit.forms import SubmissionBaseUploadForm, SubmissionAutoUploadForm from ietf.submit.models import Submission, Preapproval, SubmissionExtResource from ietf.submit.mail import add_submission_email, process_response_email +from ietf.submit.tasks import cancel_stale_submissions, process_uploaded_submission_task from ietf.utils.accesstoken import generate_access_token from ietf.utils.mail import outbox, empty_outbox, get_payload_text from ietf.utils.models import VersionInfo @@ -84,6 +91,7 @@ class BaseSubmitTestCase(TestCase): return settings.INTERNET_DRAFT_ARCHIVE_DIR def submission_file(name_in_doc, name_in_post, group, templatename, author=None, email=None, title=None, year=None, ascii=True): + _today = datetime.date.today() # construct appropriate text draft f = io.open(os.path.join(settings.BASE_DIR, "submit", templatename)) template = f.read() @@ -96,14 +104,14 @@ def submission_file(name_in_doc, name_in_post, group, templatename, author=None, if title is None: title = "Test Document" if year is None: - year = datetime.date.today().strftime("%Y") + year = _today.strftime("%Y") submission_text = template % dict( - date=datetime.date.today().strftime("%d %B %Y"), - expiration=(datetime.date.today() + datetime.timedelta(days=100)).strftime("%d %B, %Y"), + date=_today.strftime("%d %B %Y"), + expiration=(_today + datetime.timedelta(days=100)).strftime("%d %B, %Y"), year=year, - month=datetime.date.today().strftime("%B"), - day=datetime.date.today().strftime("%d"), + month=_today.strftime("%B"), + day=_today.strftime("%d"), name=name_in_doc, group=group or "", author=author.ascii if ascii else author.name, @@ -2722,6 +2730,660 @@ Subject: test return r + +# Transaction.on_commit() requires use of TransactionTestCase, but that has a performance penalty. Replace it +# with a no-op for testing purposes. +@mock.patch.object(transaction, 'on_commit', lambda x: x()) +@override_settings(IDTRACKER_BASE_URL='https://datatracker.example.com') +class ApiSubmissionTests(BaseSubmitTestCase): + def test_upload_draft(self): + """api_submission accepts a submission and queues it for processing""" + url = urlreverse('ietf.submit.views.api_submission') + xml, author = submission_file('draft-somebody-test-00', 'draft-somebody-test-00.xml', None, 'test_submission.xml') + data = { + 'xml': xml, + 'user': author.user.username, + } + with mock.patch('ietf.submit.views.process_uploaded_submission_task') as mock_task: + r = self.client.post(url, data) + self.assertEqual(r.status_code, 200) + response = r.json() + self.assertCountEqual( + response.keys(), + ['id', 'name', 'rev', 'status_url'], + ) + submission_id = int(response['id']) + self.assertEqual(response['name'], 'draft-somebody-test') + self.assertEqual(response['rev'], '00') + self.assertEqual( + response['status_url'], + 'https://datatracker.example.com' + urlreverse( + 'ietf.submit.views.api_submission_status', + kwargs={'submission_id': submission_id}, + ), + ) + self.assertEqual(mock_task.delay.call_count, 1) + self.assertEqual(mock_task.delay.call_args.args, (submission_id,)) + submission = Submission.objects.get(pk=submission_id) + self.assertEqual(submission.name, 'draft-somebody-test') + self.assertEqual(submission.rev, '00') + self.assertEqual(submission.submitter, author.formatted_email()) + self.assertEqual(submission.state_id, 'validating') + self.assertIn('Uploaded submission through API', submission.submissionevent_set.last().desc) + + def test_upload_draft_with_replaces(self): + """api_submission accepts a submission and queues it for processing""" + existing_draft = WgDraftFactory() + url = urlreverse('ietf.submit.views.api_submission') + xml, author = submission_file('draft-somebody-test-00', 'draft-somebody-test-00.xml', None, 'test_submission.xml') + data = { + 'xml': xml, + 'user': author.user.username, + 'replaces': existing_draft.name, + } + # mock out the task so we don't call to celery during testing! + with mock.patch('ietf.submit.views.process_uploaded_submission_task'): + r = self.client.post(url, data) + self.assertEqual(r.status_code, 200) + submission = Submission.objects.last() + self.assertEqual(submission.name, 'draft-somebody-test') + self.assertEqual(submission.replaces, existing_draft.name) + + def test_rejects_broken_upload(self): + """api_submission immediately rejects a submission with serious problems""" + orig_submission_count = Submission.objects.count() + url = urlreverse('ietf.submit.views.api_submission') + + # invalid submitter + xml, author = submission_file('draft-somebody-test-00', 'draft-somebody-test-00.xml', None, 'test_submission.xml') + data = { + 'xml': xml, + 'user': 'i.dont.exist@nowhere.example.com', + } + with mock.patch('ietf.submit.views.process_uploaded_submission_task') as mock_task: + r = self.client.post(url, data) + self.assertEqual(r.status_code, 400) + response = r.json() + self.assertIn('No such user: ', response['error']) + self.assertFalse(mock_task.delay.called) + self.assertEqual(Submission.objects.count(), orig_submission_count) + + # missing name + xml, _ = submission_file('', 'draft-somebody-test-00.xml', None, 'test_submission.xml', author=author) + data = { + 'xml': xml, + 'user': author.user.username, + } + with mock.patch('ietf.submit.views.process_uploaded_submission_task') as mock_task: + r = self.client.post(url, data) + self.assertEqual(r.status_code, 400) + response = r.json() + self.assertEqual(response['error'], 'Validation Error') + self.assertFalse(mock_task.delay.called) + self.assertEqual(Submission.objects.count(), orig_submission_count) + + # missing rev + xml, _ = submission_file('draft-somebody-test', 'draft-somebody-test-00.xml', None, 'test_submission.xml', author=author) + data = { + 'xml': xml, + 'user': author.user.username, + } + with mock.patch('ietf.submit.views.process_uploaded_submission_task') as mock_task: + r = self.client.post(url, data) + self.assertEqual(r.status_code, 400) + response = r.json() + self.assertEqual(response['error'], 'Validation Error') + self.assertFalse(mock_task.delay.called) + self.assertEqual(Submission.objects.count(), orig_submission_count) + + # in-process submission + SubmissionFactory(name='draft-somebody-test', rev='00') # create an already-in-progress submission + orig_submission_count += 1 # keep this up to date + xml, _ = submission_file('draft-somebody-test-00', 'draft-somebody-test-00.xml', None, 'test_submission.xml', author=author) + data = { + 'xml': xml, + 'user': author.user.username, + } + with mock.patch('ietf.submit.views.process_uploaded_submission_task') as mock_task: + r = self.client.post(url, data) + self.assertEqual(r.status_code, 400) + response = r.json() + self.assertEqual(response['error'], 'Validation Error') + self.assertFalse(mock_task.delay.called) + self.assertEqual(Submission.objects.count(), orig_submission_count) + + @override_settings(IDTRACKER_BASE_URL='http://baseurl.example.com') + def test_get_documentation(self): + """A GET to the submission endpoint retrieves documentation""" + r = self.client.get(urlreverse('ietf.submit.views.api_submission')) + self.assertTemplateUsed(r, 'submit/api_submission_info.html') + self.assertContains(r, 'http://baseurl.example.com', status_code=200) + + def test_submission_status(self): + s = SubmissionFactory(state_id='validating') + url = urlreverse('ietf.submit.views.api_submission_status', kwargs={'submission_id': s.pk}) + r = self.client.get(url) + self.assertEqual(r.status_code, 200) + self.assertEqual( + r.json(), + {'id': str(s.pk), 'state': 'validating'}, + ) + + s.state_id = 'uploaded' + s.save() + r = self.client.get(url) + self.assertEqual(r.status_code, 200) + self.assertEqual( + r.json(), + {'id': str(s.pk), 'state': 'uploaded'}, + ) + + # try an invalid one + r = self.client.get(urlreverse('ietf.submit.views.api_submission_status', kwargs={'submission_id': '999999'})) + self.assertEqual(r.status_code, 404) + + +class SubmissionUploadFormTests(BaseSubmitTestCase): + def test_check_submission_thresholds(self): + today = datetime.date.today() + yesterday = today - datetime.timedelta(days=1) + (this_group, that_group) = GroupFactory.create_batch(2, type_id='wg') + this_ip = '10.0.0.1' + that_ip = '192.168.42.42' + one_mb = 1024 * 1024 + this_draft = 'draft-this-draft' + that_draft = 'draft-different-draft' + SubmissionFactory(group=this_group, name=this_draft, rev='00', submission_date=yesterday, remote_ip=this_ip, file_size=one_mb) + SubmissionFactory(group=this_group, name=that_draft, rev='00', submission_date=yesterday, remote_ip=this_ip, file_size=one_mb) + SubmissionFactory(group=this_group, name=this_draft, rev='00', submission_date=today, remote_ip=this_ip, file_size=one_mb) + SubmissionFactory(group=this_group, name=that_draft, rev='00', submission_date=today, remote_ip=this_ip, file_size=one_mb) + SubmissionFactory(group=that_group, name=this_draft, rev='00', submission_date=yesterday, remote_ip=that_ip, file_size=one_mb) + SubmissionFactory(group=that_group, name=that_draft, rev='00', submission_date=yesterday, remote_ip=that_ip, file_size=one_mb) + SubmissionFactory(group=that_group, name=this_draft, rev='00', submission_date=today, remote_ip=that_ip, file_size=one_mb) + SubmissionFactory(group=that_group, name=that_draft, rev='00', submission_date=today, remote_ip=that_ip, file_size=one_mb) + SubmissionFactory(group=that_group, name=that_draft, rev='01', submission_date=today, remote_ip=that_ip, file_size=one_mb) + + # Tests aim to cover the permutations of DB filters that are used by the clean() method + # - all IP addresses, today + SubmissionBaseUploadForm.check_submissions_thresholds( + 'valid today, all submitters', + dict(submission_date=today), + max_amount=5, + max_size=5, # megabytes + ) + with self.assertRaisesMessage(ValidationError, 'Max submissions'): + SubmissionBaseUploadForm.check_submissions_thresholds( + 'too many today, all submitters', + dict(submission_date=today), + max_amount=4, + max_size=5, # megabytes + ) + with self.assertRaisesMessage(ValidationError, 'Max uploaded amount'): + SubmissionBaseUploadForm.check_submissions_thresholds( + 'too much today, all submitters', + dict(submission_date=today), + max_amount=5, + max_size=4, # megabytes + ) + + # - one IP address, today + SubmissionBaseUploadForm.check_submissions_thresholds( + 'valid today, one submitter', + dict(remote_ip=this_ip, submission_date=today), + max_amount=2, + max_size=2, # megabytes + ) + with self.assertRaisesMessage(ValidationError, 'Max submissions'): + SubmissionBaseUploadForm.check_submissions_thresholds( + 'too many today, one submitter', + dict(remote_ip=this_ip, submission_date=today), + max_amount=1, + max_size=2, # megabytes + ) + with self.assertRaisesMessage(ValidationError, 'Max uploaded amount'): + SubmissionBaseUploadForm.check_submissions_thresholds( + 'too much today, one submitter', + dict(remote_ip=this_ip, submission_date=today), + max_amount=2, + max_size=1, # megabytes + ) + + # - single draft/rev, today + SubmissionBaseUploadForm.check_submissions_thresholds( + 'valid today, one draft', + dict(name=this_draft, rev='00', submission_date=today), + max_amount=2, + max_size=2, # megabytes + ) + with self.assertRaisesMessage(ValidationError, 'Max submissions'): + SubmissionBaseUploadForm.check_submissions_thresholds( + 'too many today, one draft', + dict(name=this_draft, rev='00', submission_date=today), + max_amount=1, + max_size=2, # megabytes + ) + with self.assertRaisesMessage(ValidationError, 'Max uploaded amount'): + SubmissionBaseUploadForm.check_submissions_thresholds( + 'too much today, one draft', + dict(name=this_draft, rev='00', submission_date=today), + max_amount=2, + max_size=1, # megabytes + ) + + # - one group, today + SubmissionBaseUploadForm.check_submissions_thresholds( + 'valid today, one group', + dict(group=this_group, submission_date=today), + max_amount=2, + max_size=2, # megabytes + ) + with self.assertRaisesMessage(ValidationError, 'Max submissions'): + SubmissionBaseUploadForm.check_submissions_thresholds( + 'too many today, one group', + dict(group=this_group, submission_date=today), + max_amount=1, + max_size=2, # megabytes + ) + with self.assertRaisesMessage(ValidationError, 'Max uploaded amount'): + SubmissionBaseUploadForm.check_submissions_thresholds( + 'too much today, one group', + dict(group=this_group, submission_date=today), + max_amount=2, + max_size=1, # megabytes + ) + + def test_replaces_field(self): + """test SubmissionAutoUploadForm replaces field""" + request_factory = RequestFactory() + WgDraftFactory(name='draft-somebody-test') + existing_drafts = WgDraftFactory.create_batch(2) + xml, auth = submission_file('draft-somebody-test-01', 'draft-somebody-test-01.xml', None, 'test_submission.xml') + files_dict = { + 'xml': SimpleUploadedFile('draft-somebody-test-01.xml', xml.read().encode('utf8'), + content_type='application/xml'), + } + + # no replaces + form = SubmissionAutoUploadForm( + request_factory.get('/some/url'), + data={'user': auth.user.username, 'replaces': ''}, + files=files_dict, + ) + self.assertTrue(form.is_valid()) + self.assertEqual(form.cleaned_data['replaces'], '') + + # whitespace + form = SubmissionAutoUploadForm( + request_factory.get('/some/url'), + data={'user': auth.user.username, 'replaces': ' '}, + files=files_dict, + ) + self.assertTrue(form.is_valid()) + self.assertEqual(form.cleaned_data['replaces'], '') + + # one replaces + form = SubmissionAutoUploadForm( + request_factory.get('/some/url'), + data={'user': auth.user.username, 'replaces': existing_drafts[0].name}, + files=files_dict, + ) + self.assertTrue(form.is_valid()) + self.assertEqual(form.cleaned_data['replaces'], existing_drafts[0].name) + + # two replaces + form = SubmissionAutoUploadForm( + request_factory.get('/some/url'), + data={'user': auth.user.username, 'replaces': f'{existing_drafts[0].name},{existing_drafts[1].name}'}, + files=files_dict, + ) + self.assertTrue(form.is_valid()) + self.assertEqual(form.cleaned_data['replaces'], f'{existing_drafts[0].name},{existing_drafts[1].name}') + + # two replaces, extra whitespace + form = SubmissionAutoUploadForm( + request_factory.get('/some/url'), + data={'user': auth.user.username, 'replaces': f' {existing_drafts[0].name} , {existing_drafts[1].name}'}, + files=files_dict, + ) + self.assertTrue(form.is_valid()) + self.assertEqual(form.cleaned_data['replaces'], f'{existing_drafts[0].name},{existing_drafts[1].name}') + + # can't replace self + form = SubmissionAutoUploadForm( + request_factory.get('/some/url'), + data={'user': auth.user.username, 'replaces': 'draft-somebody-test'}, + files=files_dict, + ) + self.assertFalse(form.is_valid()) + self.assertIn('A draft cannot replace itself', form.errors['replaces']) + + # can't replace non-draft + review = ReviewFactory() + form = SubmissionAutoUploadForm( + request_factory.get('/some/url'), + data={'user': auth.user.username, 'replaces': review.name}, + files=files_dict, + ) + self.assertFalse(form.is_valid()) + self.assertIn('A draft can only replace another draft', form.errors['replaces']) + + # can't replace RFC + rfc = WgRfcFactory() + form = SubmissionAutoUploadForm( + request_factory.get('/some/url'), + data={'user': auth.user.username, 'replaces': rfc.name}, + files=files_dict, + ) + self.assertFalse(form.is_valid()) + self.assertIn('A draft cannot replace an RFC', form.errors['replaces']) + + # can't replace draft approved by iesg + existing_drafts[0].set_state(State.objects.get(type='draft-iesg', slug='approved')) + form = SubmissionAutoUploadForm( + request_factory.get('/some/url'), + data={'user': auth.user.username, 'replaces': existing_drafts[0].name}, + files=files_dict, + ) + self.assertFalse(form.is_valid()) + self.assertIn(f'{existing_drafts[0].name} is approved by the IESG and cannot be replaced', + form.errors['replaces']) + + # unknown draft + form = SubmissionAutoUploadForm( + request_factory.get('/some/url'), + data={'user': auth.user.username, 'replaces': 'fake-name'}, + files=files_dict, + ) + self.assertFalse(form.is_valid()) + + +class AsyncSubmissionTests(BaseSubmitTestCase): + """Tests of async submission-related tasks""" + def test_process_uploaded_submission(self): + """process_uploaded_submission should properly process a submission""" + _today = datetime.date.today() + xml, author = submission_file('draft-somebody-test-00', 'draft-somebody-test-00.xml', None, 'test_submission.xml') + xml_data = xml.read() + xml.close() + + submission = SubmissionFactory( + name='draft-somebody-test', + rev='00', + file_types='.xml', + submitter=author.formatted_email(), + state_id='validating', + ) + xml_path = Path(settings.IDSUBMIT_STAGING_PATH) / 'draft-somebody-test-00.xml' + with xml_path.open('w') as f: + f.write(xml_data) + txt_path = xml_path.with_suffix('.txt') + self.assertFalse(txt_path.exists()) + html_path = xml_path.with_suffix('.html') + self.assertFalse(html_path.exists()) + process_uploaded_submission(submission) + + submission = Submission.objects.get(pk=submission.pk) # refresh + self.assertEqual(submission.state_id, 'auth', 'accepted submission should be in auth state') + self.assertEqual(submission.title, 'Test Document') + self.assertEqual(submission.xml_version, '3') + self.assertEqual(submission.document_date, _today) + self.assertEqual(submission.abstract.strip(), 'This document describes how to test tests.') + # don't worry about testing the details of these, just that they're set + self.assertIsNotNone(submission.pages) + self.assertIsNotNone(submission.words) + self.assertNotEqual(submission.first_two_pages.strip(), '') + # at least test that these were created + self.assertTrue(txt_path.exists()) + self.assertTrue(html_path.exists()) + self.assertEqual(submission.file_size, os.stat(txt_path).st_size) + self.assertIn('Completed submission validation checks', submission.submissionevent_set.last().desc) + + def test_process_uploaded_submission_invalid(self): + """process_uploaded_submission should properly process an invalid submission""" + xml, author = submission_file('draft-somebody-test-00', 'draft-somebody-test-00.xml', None, 'test_submission.xml') + xml_data = xml.read() + xml.close() + txt, _ = submission_file('draft-somebody-test-00', 'draft-somebody-test-00.xml', None, + 'test_submission.txt', author=author) + txt_data = txt.read() + txt.close() + + # submitter is not an author + submitter = PersonFactory() + submission = SubmissionFactory( + name='draft-somebody-test', + rev='00', + file_types='.xml', + submitter=submitter.formatted_email(), + state_id='validating', + ) + xml_path = Path(settings.IDSUBMIT_STAGING_PATH) / 'draft-somebody-test-00.xml' + with xml_path.open('w') as f: + f.write(xml_data) + process_uploaded_submission(submission) + submission = Submission.objects.get(pk=submission.pk) # refresh + self.assertEqual(submission.state_id, 'cancel') + self.assertIn('not one of the document authors', submission.submissionevent_set.last().desc) + + # author has no email address in XML + submission = SubmissionFactory( + name='draft-somebody-test', + rev='00', + file_types='.xml', + submitter=author.formatted_email(), + state_id='validating', + ) + xml_path = Path(settings.IDSUBMIT_STAGING_PATH) / 'draft-somebody-test-00.xml' + with xml_path.open('w') as f: + f.write(re.sub(r'<email>.*</email>', '', xml_data)) + process_uploaded_submission(submission) + submission = Submission.objects.get(pk=submission.pk) # refresh + self.assertEqual(submission.state_id, 'cancel') + self.assertIn('Missing email address', submission.submissionevent_set.last().desc) + + # no title + submission = SubmissionFactory( + name='draft-somebody-test', + rev='00', + file_types='.xml', + submitter=author.formatted_email(), + state_id='validating', + ) + xml_path = Path(settings.IDSUBMIT_STAGING_PATH) / 'draft-somebody-test-00.xml' + with xml_path.open('w') as f: + f.write(re.sub(r'<title>.*</title>', '<title></title>', xml_data)) + process_uploaded_submission(submission) + submission = Submission.objects.get(pk=submission.pk) # refresh + self.assertEqual(submission.state_id, 'cancel') + self.assertIn('Could not extract a valid title', submission.submissionevent_set.last().desc) + + # draft name mismatch + submission = SubmissionFactory( + name='draft-different-name', + rev='00', + file_types='.xml', + submitter=author.formatted_email(), + state_id='validating', + ) + xml_path = Path(settings.IDSUBMIT_STAGING_PATH) / 'draft-different-name-00.xml' + with xml_path.open('w') as f: + f.write(xml_data) + process_uploaded_submission(submission) + submission = Submission.objects.get(pk=submission.pk) # refresh + self.assertEqual(submission.state_id, 'cancel') + self.assertIn('draft filename disagrees', submission.submissionevent_set.last().desc) + + # rev mismatch + submission = SubmissionFactory( + name='draft-somebody-test', + rev='01', + file_types='.xml', + submitter=author.formatted_email(), + state_id='validating', + ) + xml_path = Path(settings.IDSUBMIT_STAGING_PATH) / 'draft-somebody-test-01.xml' + with xml_path.open('w') as f: + f.write(xml_data) + process_uploaded_submission(submission) + submission = Submission.objects.get(pk=submission.pk) # refresh + self.assertEqual(submission.state_id, 'cancel') + self.assertIn('revision disagrees', submission.submissionevent_set.last().desc) + + # not xml + submission = SubmissionFactory( + name='draft-somebody-test', + rev='00', + file_types='.txt', + submitter=author.formatted_email(), + state_id='validating', + ) + txt_path = Path(settings.IDSUBMIT_STAGING_PATH) / 'draft-somebody-test-00.txt' + with txt_path.open('w') as f: + f.write(txt_data) + process_uploaded_submission(submission) + submission = Submission.objects.get(pk=submission.pk) # refresh + self.assertEqual(submission.state_id, 'cancel') + self.assertIn('Only XML draft submissions', submission.submissionevent_set.last().desc) + + # wrong state + submission = SubmissionFactory( + name='draft-somebody-test', + rev='00', + file_types='.xml', + submitter=author.formatted_email(), + state_id='uploaded', + ) + xml_path = Path(settings.IDSUBMIT_STAGING_PATH) / 'draft-somebody-test-00.xml' + with xml_path.open('w') as f: + f.write(xml_data) + with mock.patch('ietf.submit.utils.process_submission_xml') as mock_proc_xml: + process_uploaded_submission(submission) + submission = Submission.objects.get(pk=submission.pk) # refresh + self.assertFalse(mock_proc_xml.called, 'Should not process submission not in "validating" state') + self.assertEqual(submission.state_id, 'uploaded', 'State should not be changed') + + # failed checker + submission = SubmissionFactory( + name='draft-somebody-test', + rev='00', + file_types='.xml', + submitter=author.formatted_email(), + state_id='validating', + ) + xml_path = Path(settings.IDSUBMIT_STAGING_PATH) / 'draft-somebody-test-00.xml' + with xml_path.open('w') as f: + f.write(xml_data) + with mock.patch( + 'ietf.submit.utils.apply_checkers', + side_effect = lambda _, __: submission.checks.create( + checker='faked', + passed=False, + message='fake failure', + errors=1, + warnings=0, + items={}, + symbol='x', + ) + ): + process_uploaded_submission(submission) + submission = Submission.objects.get(pk=submission.pk) # refresh + self.assertEqual(submission.state_id, 'cancel') + self.assertIn('fake failure', submission.submissionevent_set.last().desc) + + + @mock.patch('ietf.submit.tasks.process_uploaded_submission') + def test_process_uploaded_submission_task(self, mock_method): + """process_uploaded_submission_task task should properly call its method""" + s = SubmissionFactory() + process_uploaded_submission_task(s.pk) + self.assertEqual(mock_method.call_count, 1) + self.assertEqual(mock_method.call_args.args, (s,)) + + @mock.patch('ietf.submit.tasks.process_uploaded_submission') + def test_process_uploaded_submission_task_ignores_invalid_id(self, mock_method): + """process_uploaded_submission_task should ignore an invalid submission_id""" + SubmissionFactory() # be sure there is a Submission + bad_pk = 9876 + self.assertEqual(Submission.objects.filter(pk=bad_pk).count(), 0) + process_uploaded_submission_task(bad_pk) + self.assertEqual(mock_method.call_count, 0) + + def test_process_submission_text_consistency_checks(self): + """process_submission_text should check draft metadata against submission""" + submission = SubmissionFactory( + name='draft-somebody-test', + rev='00', + title='Correct Draft Title', + ) + txt_path = Path(settings.IDSUBMIT_STAGING_PATH) / 'draft-somebody-test-00.txt' + + # name mismatch + txt, _ = submission_file( + 'draft-somebody-wrong-name-00', # name that appears in the file + 'draft-somebody-test-00.xml', + None, + 'test_submission.txt', + title='Correct Draft Title', + ) + txt_path.open('w').write(txt.read()) + with self.assertRaisesMessage(SubmissionError, 'disagrees with submission filename'): + process_submission_text(submission) + + # rev mismatch + txt, _ = submission_file( + 'draft-somebody-test-01', # name that appears in the file + 'draft-somebody-test-00.xml', + None, + 'test_submission.txt', + title='Correct Draft Title', + ) + txt_path.open('w').write(txt.read()) + with self.assertRaisesMessage(SubmissionError, 'disagrees with submission revision'): + process_submission_text(submission) + + # title mismatch + txt, _ = submission_file( + 'draft-somebody-test-00', # name that appears in the file + 'draft-somebody-test-00.xml', + None, + 'test_submission.txt', + title='Not Correct Draft Title', + ) + txt_path.open('w').write(txt.read()) + with self.assertRaisesMessage(SubmissionError, 'disagrees with submission title'): + process_submission_text(submission) + + def test_status_of_validating_submission(self): + s = SubmissionFactory(state_id='validating') + url = urlreverse('ietf.submit.views.submission_status', kwargs={'submission_id': s.pk}) + r = self.client.get(url) + self.assertContains(r, s.name) + self.assertContains(r, 'still being processed and validated', status_code=200) + + @override_settings(IDSUBMIT_MAX_VALIDATION_TIME=datetime.timedelta(minutes=30)) + def test_cancel_stale_submissions(self): + fresh_submission = SubmissionFactory(state_id='validating') + fresh_submission.submissionevent_set.create( + desc='fake created event', + time=timezone.now() - datetime.timedelta(minutes=15), + ) + stale_submission = SubmissionFactory(state_id='validating') + stale_submission.submissionevent_set.create( + desc='fake created event', + time=timezone.now() - datetime.timedelta(minutes=30, seconds=1), + ) + + cancel_stale_submissions() + + fresh_submission = Submission.objects.get(pk=fresh_submission.pk) + self.assertEqual(fresh_submission.state_id, 'validating') + self.assertEqual(fresh_submission.submissionevent_set.count(), 1) + + stale_submission = Submission.objects.get(pk=stale_submission.pk) + self.assertEqual(stale_submission.state_id, 'cancel') + self.assertEqual(stale_submission.submissionevent_set.count(), 2) + + class ApiSubmitTests(BaseSubmitTestCase): def setUp(self): super().setUp() diff --git a/ietf/submit/urls.py b/ietf/submit/urls.py index 1d99f9407..2309ec55c 100644 --- a/ietf/submit/urls.py +++ b/ietf/submit/urls.py @@ -25,4 +25,7 @@ urlpatterns = [ url(r'^manualpost/email/(?P<submission_id>\d+)/(?P<message_id>\d+)/(?P<access_token>[a-f\d]*)/$', views.show_submission_email_message), url(r'^manualpost/replyemail/(?P<submission_id>\d+)/(?P<message_id>\d+)/$', views.send_submission_email), url(r'^manualpost/sendemail/(?P<submission_id>\d+)/$', views.send_submission_email), + + # proof-of-concept for celery async tasks + url(r'^async-poke/?$', views.async_poke_test), ] \ No newline at end of file diff --git a/ietf/submit/utils.py b/ietf/submit/utils.py index c48e0eb3a..f0a78c5cf 100644 --- a/ietf/submit/utils.py +++ b/ietf/submit/utils.py @@ -1,5 +1,5 @@ -# Copyright The IETF Trust 2011-2020, All Rights Reserved # -*- coding: utf-8 -*- +# Copyright The IETF Trust 2011-2020, All Rights Reserved import datetime @@ -8,8 +8,11 @@ import os import pathlib import re import time +import traceback +import xml2rfc -from typing import Callable, Optional # pyflakes:ignore +from typing import Optional # pyflakes:ignore +from unidecode import unidecode from django.conf import settings from django.core.exceptions import ValidationError @@ -43,7 +46,8 @@ from ietf.utils import log from ietf.utils.accesstoken import generate_random_key from ietf.utils.draft import PlaintextDraft from ietf.utils.mail import is_valid_email -from ietf.utils.text import parse_unicode +from ietf.utils.text import parse_unicode, normalize_text +from ietf.utils.xmldraft import XMLDraft from ietf.person.name import unidecode_name @@ -199,7 +203,7 @@ def check_submission_revision_consistency(submission): return None -def create_submission_event(request, submission, desc): +def create_submission_event(request: Optional[HttpRequest], submission, desc): by = None if request and request.user.is_authenticated: try: @@ -209,8 +213,8 @@ def create_submission_event(request, submission, desc): SubmissionEvent.objects.create(submission=submission, by=by, desc=desc) -def docevent_from_submission(request, submission, desc, who=None): - # type: (HttpRequest, Submission, str, Optional[Person]) -> Optional[DocEvent] +def docevent_from_submission(submission, desc, who=None): + # type: (Submission, str, Optional[Person]) -> Optional[DocEvent] log.assertion('who is None or isinstance(who, Person)') try: @@ -638,7 +642,6 @@ def update_authors(draft, submission): def cancel_submission(submission): submission.state = DraftSubmissionStateName.objects.get(slug="cancel") submission.save() - remove_submission_files(submission) def rename_submission_files(submission, prev_rev, new_rev): @@ -660,11 +663,22 @@ def move_files_to_repository(submission): elif ext in submission.file_types.split(','): raise ValueError("Intended to move '%s' to '%s', but found source and destination missing.") + +def remove_staging_files(name, rev, exts=None): + """Remove staging files corresponding to a submission + + exts is a list of extensions to be removed. If None, defaults to settings.IDSUBMIT_FILE_TYPES. + """ + if exts is None: + exts = [f'.{ext}' for ext in settings.IDSUBMIT_FILE_TYPES] + basename = pathlib.Path(settings.IDSUBMIT_STAGING_PATH) / f'{name}-{rev}' + for ext in exts: + basename.with_suffix(ext).unlink(missing_ok=True) + + def remove_submission_files(submission): - for ext in submission.file_types.split(','): - source = os.path.join(settings.IDSUBMIT_STAGING_PATH, '%s-%s%s' % (submission.name, submission.rev, ext)) - if os.path.exists(source): - os.unlink(source) + remove_staging_files(submission.name, submission.rev, submission.file_types.split(',')) + def approvable_submissions_for_user(user): if not user.is_authenticated: @@ -727,6 +741,12 @@ def expire_submission(submission, by): SubmissionEvent.objects.create(submission=submission, by=by, desc="Cancelled expired submission") + +def clear_existing_files(form): + """Make sure there are no leftover files from a previous submission""" + remove_staging_files(form.filename, form.revision) + + def save_files(form): file_name = {} for ext in list(form.fields.keys()): @@ -810,6 +830,10 @@ def get_draft_meta(form, saved_files): def get_submission(form): + # See if there is a Submission in state waiting-for-draft + # for this revision. + # If so - we're going to update it otherwise we create a new object + submissions = Submission.objects.filter(name=form.filename, rev=form.revision, state_id = "waiting-for-draft").distinct() @@ -823,32 +847,28 @@ def get_submission(form): def fill_in_submission(form, submission, authors, abstract, file_size): - # See if there is a Submission in state waiting-for-draft - # for this revision. - # If so - we're going to update it otherwise we create a new object - submission.state = DraftSubmissionStateName.objects.get(slug="uploaded") submission.remote_ip = form.remote_ip submission.title = form.title submission.abstract = abstract - submission.pages = form.parsed_draft.get_pagecount() - submission.words = form.parsed_draft.get_wordcount() submission.authors = authors - submission.first_two_pages = ''.join(form.parsed_draft.pages[:2]) submission.file_size = file_size submission.file_types = ','.join(form.file_types) submission.xml_version = form.xml_version submission.submission_date = datetime.date.today() - submission.document_date = form.parsed_draft.get_creation_date() submission.replaces = "" - + if form.parsed_draft is not None: + submission.pages = form.parsed_draft.get_pagecount() + submission.words = form.parsed_draft.get_wordcount() + submission.first_two_pages = ''.join(form.parsed_draft.pages[:2]) + submission.document_date = form.parsed_draft.get_creation_date() submission.save() - submission.formal_languages.set(FormalLanguageName.objects.filter(slug__in=form.parsed_draft.get_formal_languages())) + if form.parsed_draft is not None: + submission.formal_languages.set(FormalLanguageName.objects.filter(slug__in=form.parsed_draft.get_formal_languages())) set_extresources_from_existing_draft(submission) -def apply_checkers(submission, file_name): - # run submission checkers +def apply_checker(checker, submission, file_name): def apply_check(submission, checker, method, fn): func = getattr(checker, method) passed, message, errors, warnings, info = func(fn) @@ -856,18 +876,21 @@ def apply_checkers(submission, file_name): message=message, errors=errors, warnings=warnings, items=info, symbol=checker.symbol) check.save() + # ordered list of methods to try + for method in ("check_fragment_xml", "check_file_xml", "check_fragment_txt", "check_file_txt", ): + ext = method[-3:] + if hasattr(checker, method) and ext in file_name: + apply_check(submission, checker, method, file_name[ext]) + break +def apply_checkers(submission, file_name): + # run submission checkers mark = time.time() for checker_path in settings.IDSUBMIT_CHECKER_CLASSES: lap = time.time() checker_class = import_string(checker_path) checker = checker_class() - # ordered list of methods to try - for method in ("check_fragment_xml", "check_file_xml", "check_fragment_txt", "check_file_txt", ): - ext = method[-3:] - if hasattr(checker, method) and ext in file_name: - apply_check(submission, checker, method, file_name[ext]) - break + apply_checker(checker, submission, file_name) tau = time.time() - lap log.log(f"ran {checker.__class__.__name__} ({tau:.3}s) for {file_name}") tau = time.time() - mark @@ -888,7 +911,76 @@ def accept_submission_requires_group_approval(submission): and not Preapproval.objects.filter(name=submission.name).exists() ) -def accept_submission(request, submission, autopost=False): + +class SubmissionError(Exception): + """Exception for errors during submission processing""" + pass + + +def staging_path(filename, revision, ext): + if len(ext) > 0 and ext[0] != '.': + ext = f'.{ext}' + return pathlib.Path(settings.IDSUBMIT_STAGING_PATH) / f'{filename}-{revision}{ext}' + + +def render_missing_formats(submission): + """Generate txt and html formats from xml draft + + If a txt file already exists, leaves it in place. Overwrites an existing html file + if there is one. + """ + xml2rfc.log.write_out = io.StringIO() # open(os.devnull, "w") + xml2rfc.log.write_err = io.StringIO() # open(os.devnull, "w") + xml_path = staging_path(submission.name, submission.rev, '.xml') + parser = xml2rfc.XmlRfcParser(str(xml_path), quiet=True) + # --- Parse the xml --- + xmltree = parser.parse(remove_comments=False) + # If we have v2, run it through v2v3. Keep track of the submitted version, though. + xmlroot = xmltree.getroot() + xml_version = xmlroot.get('version', '2') + if xml_version == '2': + v2v3 = xml2rfc.V2v3XmlWriter(xmltree) + xmltree.tree = v2v3.convert2to3() + + # --- Prep the xml --- + prep = xml2rfc.PrepToolWriter(xmltree, quiet=True, liberal=True, keep_pis=[xml2rfc.V3_PI_TARGET]) + prep.options.accept_prepped = True + xmltree.tree = prep.prep() + if xmltree.tree == None: + raise SubmissionError(f'Error from xml2rfc (prep): {prep.errors}') + + # --- Convert to txt --- + txt_path = staging_path(submission.name, submission.rev, '.txt') + if not txt_path.exists(): + writer = xml2rfc.TextWriter(xmltree, quiet=True) + writer.options.accept_prepped = True + writer.write(txt_path) + log.log( + 'In %s: xml2rfc %s generated %s from %s (version %s)' % ( + str(xml_path.parent), + xml2rfc.__version__, + txt_path.name, + xml_path.name, + xml_version, + ) + ) + + # --- Convert to html --- + html_path = staging_path(submission.name, submission.rev, '.html') + writer = xml2rfc.HtmlWriter(xmltree, quiet=True) + writer.write(str(html_path)) + log.log( + 'In %s: xml2rfc %s generated %s from %s (version %s)' % ( + str(xml_path.parent), + xml2rfc.__version__, + html_path.name, + xml_path.name, + xml_version, + ) + ) + + +def accept_submission(submission: Submission, request: Optional[HttpRequest] = None, autopost=False): """Accept a submission and post or put in correct state to await approvals If autopost is True, will post draft if submitter is authorized to do so. @@ -898,18 +990,15 @@ def accept_submission(request, submission, autopost=False): curr_authors = [ get_person_from_name_email(author["name"], author.get("email")) for author in submission.authors ] # Is the user authenticated as an author who can approve this submission? - user_is_author = ( - request.user.is_authenticated - and request.user.person in (prev_authors if submission.rev != '00' else curr_authors) # type: ignore - ) + requester = None + requester_is_author = False + if request is not None and request.user.is_authenticated: + requester = request.user.person + requester_is_author = requester in (prev_authors if submission.rev != '00' else curr_authors) # If "who" is None, docevent_from_submission will pull it out of submission - docevent_from_submission( - request, - submission, - desc="Uploaded new revision", - who=request.user.person if user_is_author else None, - ) + docevent_from_submission(submission, desc="Uploaded new revision", + who=requester if requester_is_author else None) replaces = DocAlias.objects.filter(name__in=submission.replaces_names) pretty_replaces = '(none)' if not replaces else ( @@ -932,6 +1021,7 @@ def accept_submission(request, submission, autopost=False): # Partial message for submission event sub_event_desc = 'Set submitter to \"%s\", replaces to %s' % (parse_unicode(submission.submitter), pretty_replaces) + create_event = True # Indicates whether still need to create an event docevent_desc = None address_list = [] if requires_ad_approval or requires_prev_ad_approval: @@ -968,11 +1058,11 @@ def accept_submission(request, submission, autopost=False): sent_to = ', '.join(address_list) sub_event_desc += ' and sent approval email to group chairs: %s' % sent_to docevent_desc = "Request for posting approval emailed to group chairs: %s" % sent_to - elif user_is_author and autopost: + elif requester_is_author and autopost: # go directly to posting submission - sub_event_desc = "New version accepted (logged-in submitter: %s)" % request.user.person # type: ignore + sub_event_desc = f'New version accepted (logged-in submitter: {requester})' post_submission(request, submission, sub_event_desc, sub_event_desc) - sub_event_desc = None # do not create submission event below, post_submission() handles it + create_event = False # do not create submission event below, post_submission() handled it else: submission.auth_key = generate_random_key() if requires_prev_authors_approval: @@ -1001,10 +1091,10 @@ def accept_submission(request, submission, autopost=False): sub_event_desc += " and sent confirmation email to submitter and authors: %s" % sent_to docevent_desc = "Request for posting confirmation emailed to submitter and authors: %s" % sent_to - if sub_event_desc: + if create_event: create_submission_event(request, submission, sub_event_desc) if docevent_desc: - docevent_from_submission(request, submission, docevent_desc, who=Person.objects.get(name="(System)")) + docevent_from_submission(submission, docevent_desc, who=Person.objects.get(name="(System)")) return address_list @@ -1035,3 +1125,131 @@ def remote_ip(request): else: remote_ip = request.META.get('REMOTE_ADDR', None) return remote_ip + + +def _normalize_title(title): + if isinstance(title, str): + title = unidecode(title) # replace unicode with best-match ascii + return normalize_text(title) # normalize whitespace + + +def process_submission_xml(submission): + """Validate and extract info from an uploaded submission""" + xml_path = staging_path(submission.name, submission.rev, '.xml') + xml_draft = XMLDraft(xml_path) + + if submission.name != xml_draft.filename: + raise SubmissionError('XML draft filename disagrees with submission filename') + if submission.rev != xml_draft.revision: + raise SubmissionError('XML draft revision disagrees with submission revision') + + authors = xml_draft.get_author_list() + for a in authors: + if not a['email']: + raise SubmissionError(f'Missing email address for author {a}') + + author_emails = [a['email'].lower() for a in authors] + submitter = get_person_from_name_email(**submission.submitter_parsed()) # the ** expands dict into kwargs + if not any( + email.address.lower() in author_emails + for email in submitter.email_set.filter(active=True) + ): + raise SubmissionError(f'Submitter ({submitter}) is not one of the document authors') + + # Fill in the submission data + submission.title = _normalize_title(xml_draft.get_title()) + if not submission.title: + raise SubmissionError('Could not extract a valid title from the XML') + submission.authors = [ + {key: auth[key] for key in ('name', 'email', 'affiliation', 'country')} + for auth in authors + ] + submission.xml_version = xml_draft.xml_version + submission.save() + + +def process_submission_text(submission): + """Validate/extract data from the text version of a submitted draft + + This assumes the draft was uploaded as XML and extracts data that is not + currently available directly from the XML. Additional processing, e.g. from + get_draft_meta(), would need to be added in order to support direct text + draft uploads. + """ + text_path = staging_path(submission.name, submission.rev, '.txt') + text_draft = PlaintextDraft.from_file(text_path) + + if submission.name != text_draft.filename: + raise SubmissionError( + f'Text draft filename ({text_draft.filename}) disagrees with submission filename ({submission.name})' + ) + if submission.rev != text_draft.revision: + raise SubmissionError( + f'Text draft revision ({text_draft.revision}) disagrees with submission revision ({submission.rev})') + text_title = _normalize_title(text_draft.get_title()) + if not text_title: + raise SubmissionError('Could not extract a valid title from the text') + if text_title != submission.title: + raise SubmissionError( + f'Text draft title ({text_title}) disagrees with submission title ({submission.title})') + + submission.abstract = text_draft.get_abstract() + submission.document_date = text_draft.get_creation_date() + submission.pages = text_draft.get_pagecount() + submission.words = text_draft.get_wordcount() + submission.first_two_pages = ''.join(text_draft.pages[:2]) + submission.file_size = os.stat(text_path).st_size + submission.save() + + submission.formal_languages.set( + FormalLanguageName.objects.filter( + slug__in=text_draft.get_formal_languages() + ) + ) + + +def process_uploaded_submission(submission): + def abort_submission(error): + cancel_submission(submission) + create_submission_event(None, submission, f'Submission rejected: {error}') + + if submission.state_id != 'validating': + log.log(f'Submission {submission.pk} is not in "validating" state, skipping.') + return # do nothing + + if submission.file_types != '.xml': + abort_submission('Only XML draft submissions can be processed.') + + try: + process_submission_xml(submission) + if check_submission_revision_consistency(submission): + raise SubmissionError( + 'Document revision inconsistency error in the database. ' + 'Please contact the secretariat for assistance.' + ) + render_missing_formats(submission) + process_submission_text(submission) + set_extresources_from_existing_draft(submission) + apply_checkers( + submission, + { + ext: staging_path(submission.name, submission.rev, ext) + for ext in ['xml', 'txt', 'html'] + } + ) + errors = [c.message for c in submission.checks.filter(passed__isnull=False) if not c.passed] + if len(errors) > 0: + raise SubmissionError('Checks failed: ' + ' / '.join(errors)) + except SubmissionError as err: + abort_submission(err) + except Exception: + log.log(f'Unexpected exception while processing submission {submission.pk}.') + log.log(traceback.format_exc()) + abort_submission('A system error occurred while processing the submission.') + + # if we get here and are still "validating", accept the draft + if submission.state_id == 'validating': + submission.state_id = 'uploaded' + submission.save() + create_submission_event(None, submission, desc="Completed submission validation checks") + accept_submission(submission) diff --git a/ietf/submit/views.py b/ietf/submit/views.py index 9eb303eb2..bab859702 100644 --- a/ietf/submit/views.py +++ b/ietf/submit/views.py @@ -7,14 +7,15 @@ import base64 import datetime from typing import Optional, cast # pyflakes:ignore +from urllib.parse import urljoin from django.conf import settings from django.contrib import messages from django.contrib.auth.models import User -from django.db import DataError +from django.db import DataError, transaction from django.urls import reverse as urlreverse from django.core.exceptions import ValidationError -from django.http import HttpResponseRedirect, Http404, HttpResponseForbidden, HttpResponse +from django.http import HttpResponseRedirect, Http404, HttpResponseForbidden, HttpResponse, JsonResponse from django.http import HttpRequest # pyflakes:ignore from django.shortcuts import get_object_or_404, redirect, render from django.views.decorators.csrf import csrf_exempt @@ -30,14 +31,16 @@ from ietf.mailtrigger.utils import gather_address_lists from ietf.message.models import Message, MessageAttachment from ietf.person.models import Email from ietf.submit.forms import ( SubmissionManualUploadForm, SubmissionAutoUploadForm, AuthorForm, - SubmitterForm, EditSubmissionForm, PreapprovalForm, ReplacesForm, SubmissionEmailForm, MessageModelForm ) + SubmitterForm, EditSubmissionForm, PreapprovalForm, ReplacesForm, SubmissionEmailForm, MessageModelForm, + DeprecatedSubmissionAutoUploadForm ) from ietf.submit.mail import send_full_url, send_manual_post_request, add_submission_email, get_reply_to from ietf.submit.models import (Submission, Preapproval, SubmissionExtResource, DraftSubmissionStateName, SubmissionEmailEvent ) +from ietf.submit.tasks import process_uploaded_submission_task, poke from ietf.submit.utils import ( approvable_submissions_for_user, preapprovals_for_user, recently_approved_by_user, validate_submission, create_submission_event, docevent_from_submission, post_submission, cancel_submission, rename_submission_files, remove_submission_files, get_draft_meta, - get_submission, fill_in_submission, apply_checkers, save_files, + get_submission, fill_in_submission, apply_checkers, save_files, clear_existing_files, check_submission_revision_consistency, accept_submission, accept_submission_requires_group_approval, accept_submission_requires_prev_auth_approval, update_submission_external_resources, remote_ip ) from ietf.stats.utils import clean_country_name @@ -106,6 +109,111 @@ def upload_submission(request): {'selected': 'index', 'form': form}) +@csrf_exempt +def api_submission(request): + def err(code, error, messages=None): + data = {'error': error} + if messages is not None: + data['messages'] = [messages] if isinstance(messages, str) else messages + return JsonResponse(data, status=code) + + if request.method == 'GET': + return render(request, 'submit/api_submission_info.html') + elif request.method == 'POST': + exception = None + submission = None + try: + form = SubmissionAutoUploadForm(request, data=request.POST, files=request.FILES) + if form.is_valid(): + log('got valid submission form for %s' % form.filename) + username = form.cleaned_data['user'] + user = User.objects.filter(username=username) + if user.count() == 0: + # See if a secondary login was being used + email = Email.objects.filter(address=username, active=True) + # The error messages don't talk about 'email', as the field we're + # looking at is still the 'username' field. + if email.count() == 0: + return err(400, "No such user: %s" % username) + elif email.count() > 1: + return err(500, "Multiple matching accounts for %s" % username) + email = email.first() + if not hasattr(email, 'person'): + return err(400, "No person matches %s" % username) + person = email.person + if not hasattr(person, 'user'): + return err(400, "No user matches: %s" % username) + user = person.user + elif user.count() > 1: + return err(500, "Multiple matching accounts for %s" % username) + else: + user = user.first() + if not hasattr(user, 'person'): + return err(400, "No person with username %s" % username) + + # There is a race condition here: creating the Submission with the name/rev + # of this draft is meant to prevent another submission from occurring. However, + # if two submissions occur at the same time, both may decide that they are the + # only submission in progress. This may result in a Submission being posted with + # the wrong files. The window for this is short, though, so it's probably + # tolerable risk. + submission = get_submission(form) + submission.state = DraftSubmissionStateName.objects.get(slug="validating") + submission.remote_ip = form.remote_ip + submission.file_types = ','.join(form.file_types) + submission.submission_date = datetime.date.today() + submission.submitter = user.person.formatted_email() + submission.replaces = form.cleaned_data['replaces'] + submission.save() + clear_existing_files(form) + save_files(form) + create_submission_event(request, submission, desc="Uploaded submission through API") + + # Wrap in on_commit so the delayed task cannot start until the view is done with the DB + transaction.on_commit( + lambda: process_uploaded_submission_task.delay(submission.pk) + ) + return JsonResponse( + { + 'id': str(submission.pk), + 'name': submission.name, + 'rev': submission.rev, + 'status_url': urljoin( + settings.IDTRACKER_BASE_URL, + urlreverse(api_submission_status, kwargs={'submission_id': submission.pk}), + ), + } + ) + else: + raise ValidationError(form.errors) + except IOError as e: + exception = e + return err(500, 'IO Error', str(e)) + except ValidationError as e: + exception = e + return err(400, 'Validation Error', e.messages) + except Exception as e: + exception = e + raise + finally: + if exception and submission: + remove_submission_files(submission) + submission.delete() + else: + return err(405, "Method not allowed") + + +@csrf_exempt +def api_submission_status(request, submission_id): + submission = get_submission_or_404(submission_id) + return JsonResponse( + { + 'id': str(submission.pk), + 'state': submission.state.slug, + } + ) + + @csrf_exempt def api_submit(request): "Automated submission entrypoint" @@ -118,7 +226,7 @@ def api_submit(request): elif request.method == 'POST': exception = None try: - form = SubmissionAutoUploadForm(request, data=request.POST, files=request.FILES) + form = DeprecatedSubmissionAutoUploadForm(request, data=request.POST, files=request.FILES) if form.is_valid(): log('got valid submission form for %s' % form.filename) username = form.cleaned_data['user'] @@ -175,7 +283,7 @@ def api_submit(request): raise ValidationError('Submitter %s is not one of the document authors' % user.username) submission.submitter = user.person.formatted_email() - sent_to = accept_submission(request, submission) + sent_to = accept_submission(submission, request) return HttpResponse( "Upload of %s OK, confirmation requests sent to:\n %s" % (submission.name, ',\n '.join(sent_to)), @@ -244,7 +352,11 @@ def submission_status(request, submission_id, access_token=None): is_ad = area and area.has_role(request.user, "ad") can_edit = can_edit_submission(request.user, submission, access_token) and submission.state_id == "uploaded" - can_cancel = (key_matched or is_secretariat) and submission.state.next_states.filter(slug="cancel") + # disallow cancellation of 'validating' submissions except by secretariat until async process is safely abortable + can_cancel = ( + (is_secretariat or (key_matched and submission.state_id != 'validating')) + and submission.state.next_states.filter(slug="cancel") + ) can_group_approve = (is_secretariat or is_ad or is_chair) and submission.state_id == "grp-appr" can_ad_approve = (is_secretariat or is_ad) and submission.state_id == "ad-appr" @@ -365,13 +477,13 @@ def submission_status(request, submission_id, access_token=None): permission_denied(request, 'You do not have permission to perform this action') # go directly to posting submission - docevent_from_submission(request, submission, desc="Uploaded new revision") + docevent_from_submission(submission, desc="Uploaded new revision") desc = "Secretariat manually posting. Approvals already received" post_submission(request, submission, desc, desc) else: - accept_submission(request, submission, autopost=True) + accept_submission(submission, request, autopost=True) if access_token: return redirect("ietf.submit.views.submission_status", submission_id=submission.pk, access_token=access_token) @@ -698,9 +810,7 @@ def cancel_waiting_for_draft(request): create_submission_event(request, submission, "Cancelled submission") if (submission.rev != "00"): # Add a doc event - docevent_from_submission(request, - submission, - "Cancelled submission for rev {}".format(submission.rev)) + docevent_from_submission(submission, "Cancelled submission for rev {}".format(submission.rev)) return redirect("ietf.submit.views.manualpost") @@ -923,4 +1033,9 @@ def get_submission_or_404(submission_id, access_token=None): if access_token and not key_matched: raise Http404 - return submission \ No newline at end of file + return submission + + +def async_poke_test(request): + result = poke.delay() + return HttpResponse(f'Poked {result}', content_type='text/plain') diff --git a/ietf/templates/submit/api_submission_info.html b/ietf/templates/submit/api_submission_info.html new file mode 100644 index 000000000..da2b21d52 --- /dev/null +++ b/ietf/templates/submit/api_submission_info.html @@ -0,0 +1,107 @@ +{% extends "base.html" %} +{# Copyright The IETF Trust 2015-2022, All Rights Reserved #} +{% load origin ietf_filters %} +{% block title %}Draft submission API instructions{% endblock %} +{% block content %} + {% origin %} + <h1 class="mb-3">Draft submission API instructions</h1> + <p> + A simplified draft submission interface, intended for automation, + is available at <code>{% url 'ietf.submit.views.api_submission' %}</code>. + </p> + <p> + The interface accepts only XML uploads that can be processed on the server, and + requires the user to have a datatracker account. A successful submit still requires + the same email confirmation round-trip as submissions done through the regular + <a href="{% url 'ietf.submit.views.upload_submission' %}">submission tool</a>. + </p> + <p> + This interface does not provide all the options which the regular submission tool does. + Some limitations: + </p> + <ul> + <li>Only XML-only uploads are supported, not text or combined.</li> + <li>Document replacement information cannot be supplied.</li> + <li> + The server expects <code>multipart/form-data</code>, supported by <code>curl</code> but <b>not</b> by <code>wget</code>. + </li> + </ul> + <p> + It takes the following parameters: + </p> + <ul> + <li> + <code>user</code> which is the user login (required) + </li> + <li> + <code>xml</code>, which is the submitted file (required) + </li> + <li> + <code>replaces</code>, a comma-separated list of draft names replaced by this submission (optional) + </li> + </ul> + <p> + When a draft is submitted, basic checks are performed immediately and an HTTP response + is sent including an appropriate http result code and JSON data describing the outcome. + </p> + <p> + On success, the JSON data format is + </p> + <pre class="border p-3 mb-3"> +{ + "id": "123", + "name": "draft-just-submitted", + "rev": "00", + "status_url": "{% absurl 'ietf.submit.views.api_submission_status' submission_id='123' %}" +}</pre> + <p> + On error, the JSON data format is + </p> + <pre class="border p-3 mb-3"> +{ + "error": "Description of the error" +}</pre> + <p> + If the basic checks passed and a successful response is sent, the draft is queued for further + processing. Its status can be monitored by issuing GET requests to the <code>status_url</code> + indicated in the JSON response. This URL will respond with JSON data in the format + </p> + <pre class="border p-3 mb-3"> +{ + "id": "123", + "state": "validating" +}</pre> + <p> + The state <code>validating</code> indicates that the draft is being or waiting to be processed. + Any other state indicates that the draft completed validation. If the validation failed or if the + draft was canceled after validation, the state will be <code>cancel</code>. + </p> + <p> + Human-readable details of the draft's status and history can be found at + {% absurl 'ietf.submit.views.submission_status' submission_id='123' %} + (replacing <code>123</code> with the <code>id</code> for the submission).) + </p> + <p> + Here is an example of submitting a draft and polling its status through the API: + </p> + <pre class="border p-3"> +$ curl -s -F "user=user.name@example.com" -F "xml=@~/draft-user-example.xml" -F "replaces=draft-user-replaced-draft" {% absurl 'ietf.submit.views.api_submission' %} | jq +{ + "id": "126375", + "name": "draft-user-example", + "rev": "00", + "status_url": "{% absurl 'ietf.submit.views.api_submission_status' submission_id='126375' %}" +} + +$ curl -s {% absurl 'ietf.submit.views.api_submission_status' submission_id='126375' %} | jq +{ + "id": "126375", + "state": "validating" +} + +$ curl -s {% absurl 'ietf.submit.views.api_submission_status' submission_id='126375' %} | jq +{ + "id": "126375", + "state": "auth" +}</pre> +{% endblock %} \ No newline at end of file diff --git a/ietf/templates/submit/api_submit_info.html b/ietf/templates/submit/api_submit_info.html index a94f85520..114b97962 100644 --- a/ietf/templates/submit/api_submit_info.html +++ b/ietf/templates/submit/api_submit_info.html @@ -1,13 +1,19 @@ {% extends "base.html" %} -{# Copyright The IETF Trust 2015, All Rights Reserved #} -{% load origin %} +{# Copyright The IETF Trust 2015-2022, All Rights Reserved #} +{% load origin ietf_filters %} {% block title %}Draft submission API instructions{% endblock %} {% block content %} {% origin %} <h1 class="mb-3">Draft submission API instructions</h1> + <p> + Note: API endpoint described here is known to have a slow response time or to fail + due to timeout for some draft submissions, particularly those with large file sizes. + It is recommended to use the <a href="{% url 'ietf.submit.views.api_submission' %}">new API endpoint</a> + instead for increased reliability. + </p> <p> A simplified draft submission interface, intended for automation, - is available at <code>{% url 'ietf.submit.views.api_submit' %}</code>. + is available at <code>{% absurl 'ietf.submit.views.api_submit' %}</code>. </p> <p> The interface accepts only XML uploads that can be processed on the server, and @@ -44,7 +50,7 @@ Here is an example: </p> <pre class="border p-3"> -$ curl -S -F "user=user.name@example.com" -F "xml=@~/draft-user-example.xml" {{ settings.IDTRACKER_BASE_URL }}{% url 'ietf.submit.views.api_submit' %} +$ curl -S -F "user=user.name@example.com" -F "xml=@~/draft-user-example.xml" {% absurl 'ietf.submit.views.api_submit' %} Upload of draft-user-example OK, confirmation requests sent to: User Name <user.name@example.com></pre> {% endblock %} \ No newline at end of file diff --git a/ietf/templates/submit/submission_status.html b/ietf/templates/submit/submission_status.html index b5c47ac75..255bc02bf 100644 --- a/ietf/templates/submit/submission_status.html +++ b/ietf/templates/submit/submission_status.html @@ -35,75 +35,77 @@ Please fix errors in the form below. </p> {% endif %} - <h2 class="mt-5">Submission checks</h2> - <p class="alert {% if passes_checks %}alert-success{% else %}alert-warning{% endif %} my-3"> - {% if passes_checks %} - Your draft has been verified to pass the submission checks. - {% else %} - Your draft has <b>NOT</b> been verified to pass the submission checks. - {% endif %} - </p> - {% if submission.authors|length > 5 %} - <p class="alert alert-danger my-3"> - <b> - This document has more than five authors listed, which is considered excessive - under normal circumstances.</b> If you plan to request publication as an RFC, this - will require additional consideration by the stream manager (for example, the - IESG), and publication may be declined unless sufficient justification is - provided. See - <a href="{% url 'ietf.doc.views_doc.document_html' name='rfc7322' %}">RFC 7322, section 4.1.1</a> - for details. + {% if submission.state_id != 'validating' %} + <h2 class="mt-5">Submission checks</h2> + <p class="alert {% if passes_checks %}alert-success{% else %}alert-warning{% endif %} my-3"> + {% if passes_checks %} + Your draft has been verified to pass the submission checks. + {% else %} + Your draft has <b>NOT</b> been verified to pass the submission checks. + {% endif %} </p> - {% endif %} - {% for check in submission.latest_checks %} - {% if check.errors %} - <p class="alert alert-warning my-3"> - The {{ check.checker }} returned {{ check.errors }} error{{ check.errors|pluralize }} - and {{ check.warnings }} warning{{ check.warnings|pluralize }}; click the button - below to see details. Please fix those, and resubmit. - </p> - {% elif check.warnings %} - <p class="alert alert-warning my-3"> - The {{ check.checker }} returned {{ check.warnings }} warning{{ check.warnings|pluralize }}. + {% if submission.authors|length > 5 %} + <p class="alert alert-danger my-3"> + <b> + This document has more than five authors listed, which is considered excessive + under normal circumstances.</b> If you plan to request publication as an RFC, this + will require additional consideration by the stream manager (for example, the + IESG), and publication may be declined unless sufficient justification is + provided. See + <a href="{% url 'ietf.doc.views_doc.document_html' name='rfc7322' %}">RFC 7322, section 4.1.1</a> + for details. </p> {% endif %} - {% endfor %} - {% for check in submission.latest_checks %} - {% if check.passed != None %} - <button class="btn btn-{% if check.passed %}{% if check.warnings %}warning{% elif check.errors %}danger{% else %}success{% endif %}{% else %}danger{% endif %}" - type="button" - data-bs-toggle="modal" - data-bs-target="#check-{{ check.pk }}"> - View {{ check.checker }} - </button> - <div class="modal fade" - id="check-{{ check.pk }}" - tabindex="-1" - role="dialog" - aria-labelledby="check-{{ check.pk }}" - aria-hidden="true"> - <div class="modal-dialog modal-dialog-scrollable modal-xl"> - <div class="modal-content"> - <div class="modal-header"> - <p class="h5 modal-title" id="{{ check.checker|slugify }}-label"> - {{ check.checker|title }} for {{ submission.name }}-{{ submission.rev }} - </p> - <button type="button" - class="btn-close" - data-bs-dismiss="modal" - aria-label="Close"></button> - </div> - <div class="modal-body" id="{{ check.checker|slugify }}-message"> + {% for check in submission.latest_checks %} + {% if check.errors %} + <p class="alert alert-warning my-3"> + The {{ check.checker }} returned {{ check.errors }} error{{ check.errors|pluralize }} + and {{ check.warnings }} warning{{ check.warnings|pluralize }}; click the button + below to see details. Please fix those, and resubmit. + </p> + {% elif check.warnings %} + <p class="alert alert-warning my-3"> + The {{ check.checker }} returned {{ check.warnings }} warning{{ check.warnings|pluralize }}. + </p> + {% endif %} + {% endfor %} + {% for check in submission.latest_checks %} + {% if check.passed != None %} + <button class="btn btn-{% if check.passed %}{% if check.warnings %}warning{% elif check.errors %}danger{% else %}success{% endif %}{% else %}danger{% endif %}" + type="button" + data-bs-toggle="modal" + data-bs-target="#check-{{ check.pk }}"> + View {{ check.checker }} + </button> + <div class="modal fade" + id="check-{{ check.pk }}" + tabindex="-1" + role="dialog" + aria-labelledby="check-{{ check.pk }}" + aria-hidden="true"> + <div class="modal-dialog modal-dialog-scrollable modal-xl"> + <div class="modal-content"> + <div class="modal-header"> + <p class="h5 modal-title" id="{{ check.checker|slugify }}-label"> + {{ check.checker|title }} for {{ submission.name }}-{{ submission.rev }} + </p> + <button type="button" + class="btn-close" + data-bs-dismiss="modal" + aria-label="Close"></button> + </div> + <div class="modal-body" id="{{ check.checker|slugify }}-message"> <pre>{{ check.message|urlize_ietf_docs|linkify }}</pre> - </div> - <div class="modal-footer"> - <button type="button" class="btn btn-secondary" data-bs-dismiss="modal">Close</button> + </div> + <div class="modal-footer"> + <button type="button" class="btn btn-secondary" data-bs-dismiss="modal">Close</button> + </div> </div> </div> </div> - </div> - {% endif %} - {% endfor %} + {% endif %} + {% endfor %} + {% endif %} <div class="modal fade" id="twopages" tabindex="-1" @@ -130,6 +132,17 @@ <p class="alert alert-warning my-3"> This submission is awaiting the first draft upload. </p> + {% elif submission.state_id == 'validating' %} + <p class="alert alert-warning my-3"> + This submission is still being processed and validated. This normally takes a few minutes after + submission. + {% with earliest_event=submission.submissionevent_set.last %} + {% if earliest_event %} + It has been {{ earliest_event.time|timesince }} since submission. + {% endif %} + {% endwith %} + Please contact the secretariat for assistance if it has been more than an hour. + </p> {% else %} <h2 class="mt-5">Meta-data from the submission</h2> {% if errors %} @@ -172,12 +185,14 @@ {% else %} {{ submission.name }} {% endif %} - <button class="btn btn-primary btn-sm float-end ms-1" - type="button" - data-bs-toggle="modal" - data-bs-target="#twopages"> - View first two pages - </button> + {% if submission.first_two_pages %} + <button class="btn btn-primary btn-sm float-end ms-1" + type="button" + data-bs-toggle="modal" + data-bs-target="#twopages"> + View first two pages + </button> + {% endif %} {% show_submission_files submission %} {% if errors.files %} <p class="mt-1 mb-0 text-danger"> @@ -215,159 +230,141 @@ {% endif %} </td> </tr> - <tr> - <th scope="row">Document date</th> - <td> - {{ submission.document_date }} - {% if errors.document_date %} - <p class="mt-1 mb-0 text-danger"> - {{ errors.document_date }} - </p> - {% endif %} - </td> - </tr> - <tr> - <th scope="row">Submission date</th> - <td>{{ submission.submission_date }}</td> - </tr> - <tr> - <th scope="row">Title</th> - <td> - {{ submission.title|default:"" }} - {% if errors.title %} - <p class="mt-1 mb-0 text-danger"> - {{ errors.title }} - </p> - {% endif %} - </td> - </tr> - <tr> - <th scope="row">Author count</th> - <td> - {{ submission.authors|length }} author{{ submission.authors|pluralize }} - {% if errors.authors %} - <p class="mt-1 mb-0 text-danger"> - {{ errors.authors|safe }} - </p> - {% endif %} - </td> - </tr> - {% for author in submission.authors %} + {% if submission.state_id != 'validating' %} <tr> - <th scope="row">Author {{ forloop.counter }}</th> + <th scope="row">Document date</th> <td> - {{ author.name }} - {% if author.email %}<{{ author.email|linkify }}>{% endif %} - <br> - {% if author.affiliation %} - {{ author.affiliation }} - {% else %} - <i>unknown affiliation</i> - {% endif %} - <br> - {% if author.country %} - {{ author.country }} - {% if author.cleaned_country and author.country != author.cleaned_country %} - <span class="text-muted">(understood to be {{ author.cleaned_country }})</span> - {% endif %} - {% else %} - <i>unknown country</i> - {% endif %} - {% if author.country and not author.cleaned_country %} - <br> - <span class="text-warning">Unrecognized country: "{{ author.country }}"</span>: See - <a href="{% url "ietf.stats.views.known_countries_list" %}"> - recognized country names - </a>. - {% endif %} - {% for auth_err in author.errors %} + {{ submission.document_date }} + {% if errors.document_date %} <p class="mt-1 mb-0 text-danger"> - {{ auth_err }} + {{ errors.document_date }} </p> - {% endfor %} + {% endif %} </td> </tr> - {% endfor %} - <tr> - <th scope="row"> - Abstract - </th> - <td> - {{ submission.abstract|urlize_ietf_docs|linkify|linebreaksbr }} - {% if errors.abstract %} - <p class="mt-1 mb-0 text-danger"> - {{ errors.abstract }} - </p> - {% endif %} - </td> - </tr> - <tr> - <th scope="row"> - Page count - </th> - <td> - {{ submission.pages }} - {% if errors.pages %} - <p class="mt-1 mb-0 text-danger"> - {{ errors.pages }} - </p> - {% endif %} - </td> - </tr> - <tr> - <th scope="row"> - File size - </th> - <td> - {{ submission.file_size|filesizeformat }} - </td> - </tr> - <tr> - <th scope="row"> - Formal languages used - </th> - <td> - {% for l in submission.formal_languages.all %} - {{ l.name }}{% if not forloop.last %},{% endif %} - {% empty %}None recognized - {% endfor %} - {% if errors.formal_languages %} - <p class="mt-1 mb-0 text-danger"> - {{ errors.formal_languages }} - </p> - {% endif %} - </td> - </tr> - <tr> - <th scope="row"> - Submission additional resources - </th> - <td> - {% for r in external_resources.current %} - {% with res=r.res added=r.added %} - <div> - {{ res.name.name }}: {{ res.value }} - {% if res.display_name %}(as "{{ res.display_name }}"){% endif %} - {% if external_resources.show_changes and added %}<span class="badge rounded-pill bg-success">New</span>{% endif %} - </div> - {% endwith %} - {% empty %} - None - {% endfor %} - </td> - </tr> - {% if external_resources.show_changes %} + <tr> + <th scope="row">Submission date</th> + <td>{{ submission.submission_date }}</td> + </tr> + <tr> + <th scope="row">Title</th> + <td> + {{ submission.title|default:"" }} + {% if errors.title %} + <p class="mt-1 mb-0 text-danger"> + {{ errors.title }} + </p> + {% endif %} + </td> + </tr> + <tr> + <th scope="row">Author count</th> + <td> + {{ submission.authors|length }} author{{ submission.authors|pluralize }} + {% if errors.authors %} + <p class="mt-1 mb-0 text-danger"> + {{ errors.authors|safe }} + </p> + {% endif %} + </td> + </tr> + {% for author in submission.authors %} + <tr> + <th scope="row">Author {{ forloop.counter }}</th> + <td> + {{ author.name }} + {% if author.email %}<{{ author.email|linkify }}>{% endif %} + <br> + {% if author.affiliation %} + {{ author.affiliation }} + {% else %} + <i>unknown affiliation</i> + {% endif %} + <br> + {% if author.country %} + {{ author.country }} + {% if author.cleaned_country and author.country != author.cleaned_country %} + <span class="text-muted">(understood to be {{ author.cleaned_country }})</span> + {% endif %} + {% else %} + <i>unknown country</i> + {% endif %} + {% if author.country and not author.cleaned_country %} + <br> + <span class="text-warning">Unrecognized country: "{{ author.country }}"</span>: See + <a href="{% url "ietf.stats.views.known_countries_list" %}"> + recognized country names + </a>. + {% endif %} + {% for auth_err in author.errors %} + <p class="mt-1 mb-0 text-danger"> + {{ auth_err }} + </p> + {% endfor %} + </td> + </tr> + {% endfor %} <tr> <th scope="row"> - Current document additional resources + Abstract </th> <td> - {% for r in external_resources.previous %} - {% with res=r.res removed=r.removed %} + {{ submission.abstract|urlize_ietf_docs|linkify|linebreaksbr }} + {% if errors.abstract %} + <p class="mt-1 mb-0 text-danger"> + {{ errors.abstract }} + </p> + {% endif %} + </td> + </tr> + <tr> + <th scope="row"> + Page count + </th> + <td> + {{ submission.pages }} + {% if errors.pages %} + <p class="mt-1 mb-0 text-danger"> + {{ errors.pages }} + </p> + {% endif %} + </td> + </tr> + <tr> + <th scope="row"> + File size + </th> + <td> + {{ submission.file_size|filesizeformat }} + </td> + </tr> + <tr> + <th scope="row"> + Formal languages used + </th> + <td> + {% for l in submission.formal_languages.all %} + {{ l.name }}{% if not forloop.last %},{% endif %} + {% empty %}None recognized + {% endfor %} + {% if errors.formal_languages %} + <p class="mt-1 mb-0 text-danger"> + {{ errors.formal_languages }} + </p> + {% endif %} + </td> + </tr> + <tr> + <th scope="row"> + Submission additional resources + </th> + <td> + {% for r in external_resources.current %} + {% with res=r.res added=r.added %} <div> {{ res.name.name }}: {{ res.value }} {% if res.display_name %}(as "{{ res.display_name }}"){% endif %} - {% if removed %}<span class="badge rounded-pill bg-warning">Removed</span>{% endif %} + {% if external_resources.show_changes and added %}<span class="badge rounded-pill bg-success">New</span>{% endif %} </div> {% endwith %} {% empty %} @@ -375,6 +372,26 @@ {% endfor %} </td> </tr> + {% if external_resources.show_changes %} + <tr> + <th scope="row"> + Current document additional resources + </th> + <td> + {% for r in external_resources.previous %} + {% with res=r.res removed=r.removed %} + <div> + {{ res.name.name }}: {{ res.value }} + {% if res.display_name %}(as "{{ res.display_name }}"){% endif %} + {% if removed %}<span class="badge rounded-pill bg-warning">Removed</span>{% endif %} + </div> + {% endwith %} + {% empty %} + None + {% endfor %} + </td> + </tr> + {% endif %} {% endif %} </tbody> </table> @@ -546,7 +563,7 @@ {% for e in submission.submissionevent_set.all %} <tr> <td class="text-nowrap"> - {{ e.time|date:"Y-m-d" }} + <span title="{{ e.time }}">{{ e.time|date:"Y-m-d" }}</span> </td> <td> {% if e.by %} diff --git a/ietf/utils/draft.py b/ietf/utils/draft.py index 64f5c6b05..12a614701 100755 --- a/ietf/utils/draft.py +++ b/ietf/utils/draft.py @@ -142,12 +142,28 @@ class Draft: raise NotImplementedError def get_author_list(self): + """Get detailed author list + + Returns a list of dicts with the following keys: + full_name, first_name, middle_initial, last_name, + name_suffix, email, country, company + Values will be None if not available + """ raise NotImplementedError def get_authors(self): + """Get simple author list + + Get as list of strings with author name and email within angle brackets + """ raise NotImplementedError def get_authors_with_firm(self): + """Get simple list of authors with firm (company) info + + Get as list of strings with author name and email within angle brackets and + company in parentheses + """ raise NotImplementedError def get_creation_date(self): @@ -558,6 +574,8 @@ class PlaintextDraft(Draft): def get_author_list(self): # () -> List[List[str, str, str, str, str, str, str]] """Returns a list of tuples, with each tuple containing (given_names, surname, email, company). Email will be None if unknown. + + Todo update to agree with superclass method signature """ if self._author_info == None: self.extract_authors() diff --git a/ietf/utils/mail.py b/ietf/utils/mail.py index 45a145b30..04c0ed916 100644 --- a/ietf/utils/mail.py +++ b/ietf/utils/mail.py @@ -192,7 +192,10 @@ def encode_message(txt): return MIMEText(txt.encode('utf-8'), 'plain', 'UTF-8') def send_mail_text(request, to, frm, subject, txt, cc=None, extra=None, toUser=False, bcc=None, copy=True, save=True): - """Send plain text message.""" + """Send plain text message. + + request can be None unless it is needed by the template + """ msg = encode_message(txt) return send_mail_mime(request, to, frm, subject, msg, cc, extra, toUser, bcc, copy=copy, save=save) diff --git a/ietf/utils/text.py b/ietf/utils/text.py index 39989d5e0..1f194a983 100644 --- a/ietf/utils/text.py +++ b/ietf/utils/text.py @@ -248,6 +248,7 @@ def unwrap(s): return s.replace('\n', ' ') def normalize_text(s): + """Normalize various unicode whitespaces to ordinary spaces""" return re.sub(r'[\s\n\r\u2028\u2029]+', ' ', s, flags=re.U).strip() def parse_unicode(text): diff --git a/ietf/utils/xmldraft.py b/ietf/utils/xmldraft.py index 8b637292d..15bf745cc 100644 --- a/ietf/utils/xmldraft.py +++ b/ietf/utils/xmldraft.py @@ -1,6 +1,7 @@ # Copyright The IETF Trust 2022, All Rights Reserved # -*- coding: utf-8 -*- -import os +import io +import re import xml2rfc import debug # pyflakes: ignore @@ -13,8 +14,7 @@ from .draft import Draft class XMLDraft(Draft): """Draft from XML source - Currently just a holding place for get_refs() for an XML file. Can eventually expand - to implement the other public methods of Draft as need arises. + Not all methods from the superclass are implemented yet. """ def __init__(self, xml_file): """Initialize XMLDraft instance @@ -23,30 +23,42 @@ class XMLDraft(Draft): """ super().__init__() # cast xml_file to str so, e.g., this will work with a Path - self.xmltree = self.parse_xml(str(xml_file)) + self.xmltree, self.xml_version = self.parse_xml(str(xml_file)) self.xmlroot = self.xmltree.getroot() + self.filename, self.revision = self._parse_docname() @staticmethod def parse_xml(filename): + """Parse XML draft + + Converts to xml2rfc v3 schema, then returns the root of the v3 tree and the original + xml version. + """ orig_write_out = xml2rfc.log.write_out orig_write_err = xml2rfc.log.write_err - tree = None + parser_out = io.StringIO() + parser_err = io.StringIO() + with ExitStack() as stack: @stack.callback def cleanup(): # called when context exited, even if there's an exception xml2rfc.log.write_out = orig_write_out xml2rfc.log.write_err = orig_write_err - xml2rfc.log.write_out = open(os.devnull, 'w') - xml2rfc.log.write_err = open(os.devnull, 'w') + xml2rfc.log.write_out = parser_out + xml2rfc.log.write_err = parser_err parser = xml2rfc.XmlRfcParser(filename, quiet=True) - tree = parser.parse() + try: + tree = parser.parse() + except Exception as e: + raise XMLParseError(parser_out.getvalue(), parser_err.getvalue()) from e + xml_version = tree.getroot().get('version', '2') if xml_version == '2': v2v3 = xml2rfc.V2v3XmlWriter(tree) tree.tree = v2v3.convert2to3() - return tree + return tree, xml_version def _document_name(self, anchor): """Guess document name from reference anchor @@ -76,6 +88,68 @@ class XMLDraft(Draft): section_name = section_elt.get('title') # fall back to title if we have it return section_name + def _parse_docname(self): + docname = self.xmlroot.attrib.get('docName') + revmatch = re.match( + r'^(?P<filename>.+?)(?:-(?P<rev>[0-9][0-9]))?$', + docname, + + ) + if revmatch is None: + raise ValueError('Unable to parse docName') + # If a group had no match it is None + return revmatch.group('filename'), revmatch.group('rev') + + def get_title(self): + return self.xmlroot.findtext('front/title').strip() + + # todo fix the implementation of XMLDraft.get_abstract() + # + # This code was pulled from ietf.submit.forms where it existed for some time. + # It does not work, at least with modern xml2rfc. This assumes that the abstract + # is simply text in the front/abstract node, but the XML schema wraps the actual + # abstract text in <t> elements (and allows <dl>, <ol>, and <ul> as well). As a + # result, this method normally returns an empty string, which is later replaced by + # the abstract parsed from the rendered text. For now, I a commenting this out + # and making it explicit that the abstract always comes from the text format. + # + # def get_abstract(self): + # """Extract the abstract""" + # abstract = self.xmlroot.findtext('front/abstract') + # return abstract.strip() if abstract else '' + + def get_author_list(self): + """Get detailed author list + + Returns a list of dicts with the following keys: + name, first_name, middle_initial, last_name, + name_suffix, email, country, affiliation + Values will be None if not available + """ + result = [] + empty_author = { + k: None for k in [ + 'name', 'first_name', 'middle_initial', 'last_name', + 'name_suffix', 'email', 'country', 'affiliation', + ] + } + + for author in self.xmlroot.findall('front/author'): + info = { + 'name': author.attrib.get('fullname'), + 'email': author.findtext('address/email'), + 'affiliation': author.findtext('organization'), + } + elem = author.find('address/postal/country') + if elem is not None: + ascii_country = elem.get('ascii', None) + info['country'] = ascii_country if ascii_country else elem.text + for item in info: + if info[item]: + info[item] = info[item].strip() + result.append(empty_author | info) # merge, preferring info + return result + def get_refs(self): """Extract references from the draft""" refs = {} @@ -85,3 +159,14 @@ class XMLDraft(Draft): for ref in (section.findall('./reference') + section.findall('./referencegroup')): refs[self._document_name(ref.get('anchor'))] = ref_type return refs + + +class XMLParseError(Exception): + """An error occurred while parsing""" + def __init__(self, out: str, err: str, *args): + super().__init__(*args) + self._out = out + self._err = err + + def parser_msgs(self): + return self._out.splitlines() + self._err.splitlines() diff --git a/requirements.txt b/requirements.txt index 4b1d22949..cd0d8a116 100644 --- a/requirements.txt +++ b/requirements.txt @@ -5,12 +5,14 @@ argon2-cffi>=21.3.0 # For the Argon2 password hasher option beautifulsoup4>=4.11.1 # Only used in tests bibtexparser>=1.2.0 # Only used in tests bleach>=5.0.0 +celery>=5.2.6 coverage>=4.5.4,<5.0 # Coverage 5.x moves from a json database to SQLite. Moving to 5.x will require substantial rewrites in ietf.utils.test_runner and ietf.release.views decorator>=5.1.1 defusedxml>=0.7.1 # for TastyPie when using xml; not a declared dependency Django>=2.2.28,<3.0 django-analytical>=3.1.0 django-bootstrap5>=21.3 +django-celery-beat>=2.3.0 django-csp>=3.7 django-cors-headers>=3.11.0 django-debug-toolbar>=3.2.4