Commit graph

72 commits

Author SHA1 Message Date
Jennifer Richards cf62b46093 Find references from submitted XML instead of rendering to text and parsing. Fixes #3342. Commit ready for merge.
- Legacy-Id: 19825
2022-01-07 17:53:23 +00:00
Robert Sparks 50a1e6e66b Tune text draft reference extractor. Fixes #3404. Commit ready for merge.
- Legacy-Id: 19363
2021-09-14 16:44:30 +00:00
Robert Sparks 3697180cc1 Reverted merge of timezone-aware migration efforts.
- Legacy-Id: 18792
2021-01-12 16:54:20 +00:00
Henrik Levkowetz 774e752a54 Snapshot of timezone-aware datatracker code. Tests pass, and the test-crawler shows only expected differences. Trunk changes merged in up to r18768.
- Legacy-Id: 18770
2020-12-16 23:53:37 +00:00
Henrik Levkowetz 726fcbf27d Removed all __future__ imports.
- Legacy-Id: 17391
2020-03-05 23:53:42 +00:00
Henrik Levkowetz 1c808bf63b Removed further six usage.
- Legacy-Id: 17387
2020-03-05 15:54:32 +00:00
Henrik Levkowetz e9a37d8ac8 Removed six.text_type(), changed six.moves.urllib to plain urllib, and removed now unused six imports.
- Legacy-Id: 17385
2020-03-05 14:41:41 +00:00
Henrik Levkowetz 33e8733b91 Fixed up mypy issues or added type:ignore comments as needed for a clean mypy run.
- Legacy-Id: 16772
2019-09-30 15:42:18 +00:00
Henrik Levkowetz 8c6eb3a30a Python2/3 compatibility: Changed the use of open() and StringIO to io.open() etc.
- Legacy-Id: 16458
2019-07-15 19:14:04 +00:00
Henrik Levkowetz f481f5c3e6 Replaced use of six with the equivalent pure python3 constructs.
- Legacy-Id: 16428
2019-07-08 10:43:47 +00:00
Henrik Levkowetz 0589d0b313 Changed a bunch of regexes to use r strings; also miscellaneous smaller fixes.
- Legacy-Id: 16376
2019-07-04 15:51:05 +00:00
Henrik Levkowetz 3ec7e864be Converted leading tabs to spaces in ietf/**/*.py
- Legacy-Id: 16310
2019-06-27 14:51:02 +00:00
Henrik Levkowetz d7f5c84182 Initial 2to3 patch with added copyright statement updates.
- Legacy-Id: 16309
2019-06-27 14:40:54 +00:00
Henrik Levkowetz d19228110c Applied a patch from dkg@fifthhorseman.net: py3 compatibility: fix another instance of integer division
- Legacy-Id: 15896
2019-01-15 17:50:33 +00:00
Henrik Levkowetz 98a74bd7f3 Moved __future__ imports down so as not to obscure the module docstring. Fixes inability to run '$ ietf/utils/draft.py -h'.
- Legacy-Id: 15894
2019-01-14 22:28:52 +00:00
Henrik Levkowetz 910d3d7723 Applied a patch from dkg@fifthhorseman.net: py3 compatibility: Use a list of dictionary keys
In python3, dict.keys() produces a dict_keys object, not a list.
    Since this code treats it as a list, we'll just be explicit about
    that.
 - Legacy-Id: 15893
2019-01-14 21:06:08 +00:00
Henrik Levkowetz c8f98e125c Applied a patch from dkg@fifthhorseman.net: Fix regex manipulation for word characters.
in python 3.7, re.sub() started treating unknown escape sequences in
     as errors.  Fix this by sending an escaped \ where we mean to
    pass it through raw.
    
    https://docs.python.org/3/library/re.html#re.sub
 - Legacy-Id: 15892
2019-01-14 21:03:28 +00:00
Henrik Levkowetz e39358312b Applied a patch from dkg@fifthhorseman.net: py3 compatibility: Use // for explicit integer division
Without this fix, in modern versions of python, the changed line
    produces:
    
    TypeError: 'float' object cannot be interpreted as an integer
 - Legacy-Id: 15891
2019-01-14 21:02:01 +00:00
Henrik Levkowetz e718272e71 Applied a patch from dkg@fifthhorseman.net: py3 compatibility: Use modern form of exception handling
- Legacy-Id: 15890
2019-01-14 21:00:50 +00:00
Henrik Levkowetz 8840efaef4 Applied a patch from dkg@fifthhorseman.net: py3 compatibility: use print function.
- Legacy-Id: 15889
2019-01-14 20:56:59 +00:00
Henrik Levkowetz a485c74314 Merged in [14880] from rjsparks@nostrum.com:
Added a Draft test suite.
 - Legacy-Id: 14901
Note: SVN reference [14880] has been migrated to Git commit e09a28cad2
2018-03-22 16:34:10 +00:00
Russ Housley 565b10e00e Improve parser for references in Internet-Drafts. Fixes #2360
- Legacy-Id: 14851
2018-03-17 18:25:31 +00:00
Henrik Levkowetz 48fe02d58c Permit tildes in romanization of draft author names when looking for draft authors. Can be used in romanization of arabic names.
- Legacy-Id: 14256
2017-11-01 11:51:24 +00:00
Henrik Levkowetz 0e00adc5ee Another tweak to the draft author extraction code, to handle some name transliterations using multiple leading grave accents.
- Legacy-Id: 14149
2017-09-21 09:28:18 +00:00
Henrik Levkowetz 2c1438c240 Moved unidecode_name from utils.text to person.name.
Modified UserFactory to use a new locale for each new user, instead of the
same locale for a whole test run.  This (almost) ensures the exercise of
code to deal with non-ascii names, something which would not happen if a
locale with ascii names was chosen at the start of a run.

Modified name.initials() to not use non-word characters as initials.

Modified unidecode_name() to do more normalization, to conform to the
conventions used in internet-drafts.

Added saving of the factory-boy random state in order to be able to re-run
a test suite with the same pseudo-random sequence as in a previous failed
run.

Fixed an issue with email formatting in test_api_submit_ok().

Modified the draft author extraction code to deal better with names with
embedded apostrophes.
 - Legacy-Id: 14141
2017-09-20 15:36:30 +00:00
Henrik Levkowetz aafd6290a6 Added an option to ietf.utils.draft.Draft to pull document name from the source file name.
- Legacy-Id: 14089
2017-08-31 14:48:43 +00:00
Henrik Levkowetz b42f1cbeb5 Replaced the use of unaccent.asciify(), which has similar functionality to unidecode.unidecode(). Changed the draft parser to work exclusively with unicode text, which both makes the removal of unaccent easier, and takes us closer to Py35 compatibility. Adjusted callers of the draft parser to send in unicode.
- Legacy-Id: 13673
2017-06-18 18:23:18 +00:00
Henrik Levkowetz 76628be3fd Merged in ^/branch/iola/author-stats-r13145 from olau@iola.dk, and fixed some tests in code which moved after the latest merge with trunk. The test suite passes, but the migrations are _not_ ready to run, because of numbering conflicts (again due to code changes on trunk since the latest sync).
- Legacy-Id: 13479
2017-05-31 20:59:26 +00:00
Henrik Levkowetz 38bfdb4095 Fixed a bug in the earlier author extraction bugfix.
- Legacy-Id: 13295
2017-05-10 12:21:17 +00:00
Henrik Levkowetz fb70e9a4ff Fixed an issue with the author extraction code.
- Legacy-Id: 13288
2017-05-09 19:19:55 +00:00
Ole Laursen ef4d55f0c9 Apply patch from Henrik Levkowetz to fix some problems of author parse
errors where the affiliation is mistakenly thought to be an extra
author (some of these still remain)
 - Legacy-Id: 13142
2017-03-27 08:33:49 +00:00
Ole Laursen d2e85a3aa3 Apply draft parser patch from Henrik to improve the patch on trunk to
combine paragraphs across page splits - this makes the country part of
the parser find more countries
 - Legacy-Id: 12848
2017-02-15 19:10:59 +00:00
Ole Laursen b2ff10b0f2 Add support for extracting the country line from the author addresses
to the draft parser (incorporating patch from trunk), store the
extracted country instead of trying to turn it into an ISO country
code, add country and continent name models and add initial data for
those, add helper function for cleaning the countries, add author
country and continent charts, move the affiliation models to
stats/models.py, fix a bunch of bugs.
 - Legacy-Id: 12846
2017-02-15 18:43:57 +00:00
Henrik Levkowetz 44ad914fba Tweaked the company name extraction code in class Draft.
- Legacy-Id: 12842
2017-02-15 14:09:54 +00:00
Henrik Levkowetz bb5e5b97ba Another tweak to handle page break paragraph joins better in class Draft.
- Legacy-Id: 12840
2017-02-14 17:41:30 +00:00
Henrik Levkowetz 6158221fa8 Tweaked the author extraction to recognize short lines as paragraph ends, not only lines ending in '.' or ':'
- Legacy-Id: 12837
2017-02-14 14:23:15 +00:00
Ole Laursen aebfe44f9e Add simple detection of formal languages used in draft, partially
based on the code in getauthors by Jari Arkko
 - Legacy-Id: 12657
2017-01-16 16:08:56 +00:00
Ole Laursen 34a9f36534 Add helper for getting word count from draft
- Legacy-Id: 12655
2017-01-16 11:35:48 +00:00
Henrik Levkowetz 887455c1d5 Make sure to not include draft name in the title extracted from draft text.
- Legacy-Id: 12176
2016-10-19 12:18:59 +00:00
Henrik Levkowetz f5ca3a12bc Fixed a bug in the header/footer stripping done before abstract extraction when a draft is submitted.
- Legacy-Id: 10519
2015-11-24 20:01:31 +00:00
Henrik Levkowetz 1bf4356002 Improved regex for the Dr.-Ing. honorific fix.
- Legacy-Id: 8509
2014-10-29 06:53:34 +00:00
Henrik Levkowetz 770f79e601 Added 'Dr.-Ing.' to the recognised honorifics in the author extraction code.
- Legacy-Id: 8508
2014-10-29 06:24:41 +00:00
Henrik Levkowetz 46cb5cbdca Did a number of changes to the author extraction method of class Draft in order to make it able to match up names with double-word family names on the first page (A. Foo Bar) with (familyname, given-name) ordering (Foo Bar Any) in the Authors' Addresses section. Regression tested against 200+ known good author extraction results. A number of stronger restrictions in regular expressions had to be introduced to avoid regression, which is probably all to the good.
- Legacy-Id: 8507
2014-10-28 15:45:47 +00:00
Henrik Levkowetz e3077c6e50 Fixed a bug in the new ISO-date code for draft metadata extraction.
- Legacy-Id: 8502
2014-10-27 17:01:16 +00:00
Henrik Levkowetz 4dddf14be0 Added support for ISO-format dates (or RFC 3339 dates, if you will) to the date parsing done for the submission tool. Also refined the regexes a bit to avoid false matches on for instance things like 'Juniper 2014'.
- Legacy-Id: 8501
2014-10-27 16:51:19 +00:00
Henrik Levkowetz 9d5a9c143e Reverted changes in ietf/utils/draft.py which should not have been part of [8499].
- Legacy-Id: 8500
Note: SVN reference [8499] has been migrated to Git commit a8ddac15e2
2014-10-27 16:35:50 +00:00
Henrik Levkowetz a8ddac15e2 Merged in [8498] from rjsparks@nostrum.com:\n Reworked logic flow for editing shepherds. Added message to inform the user when the shepherd is not changed. Fixes bug #1508.
- Legacy-Id: 8499
Note: SVN reference [8498] has been migrated to Git commit 055202dee4
2014-10-27 16:01:51 +00:00
Henrik Levkowetz 8c42989d5d Pyflakes cleanup compliant with pyflakes 0.8.1, which seems to find things 0.8.0 didn't fin.
- Legacy-Id: 7558
2014-04-01 16:25:18 +00:00
Henrik Levkowetz 49edc7404e Made ietf/utils pyflakes-clean.
- Legacy-Id: 7496
2014-03-16 07:26:03 +00:00
Henrik Levkowetz 258ac770b3 Better handling of draft name extraction when there's no extension given.
- Legacy-Id: 6675
2013-11-06 22:18:51 +00:00