Commit graph

26 commits

Author SHA1 Message Date
Jennifer Richards e91bda7e5e
feat: consolidate HTML sanitizing (#8471)
* refactor: isolate bleach code

* refactor: move html fns to html.py

* refactor: lose the bleach.py module; refactor

* refactor: sanitize_document -> clean_html

Drops <meta charset="utf-8"> addition after cleaning.

* fix: disambiguate import

* feat: restore <meta charset="utf-8"> tag

* chore: comments

* chore(deps): drop lxml_html_clean package

* refactor: on second thought, no meta charset

* refactor: sanitize_fragment -> clean_html

* test: remove check for charset

* chore: fix lint
2025-01-28 11:28:19 -06:00
Lars Eggert 0038151bf9
fix: Fix removetags (#4226)
I don't think this ever worked.
2022-07-18 09:39:11 -05:00
Lars Eggert fd087d4e16
fix: Avoid crashes in urlize_ietf_docs (#4161)
* fix: Don't crash when urlreverse fails as part of urlize_ietf_docs

Also fix an HTMLization nit.

* Fix more corner cases found during test-crawl

* Handle "I-D.*"" reference-style matches

* Refactor use of bleach. Better Markdown linkification and formatting.

* Address review comment from @rjsparks
2022-07-07 12:27:30 -05:00
Lars Eggert a5cbf5307e More fixes
- Legacy-Id: 19835
2022-01-12 11:54:00 +00:00
Henrik Levkowetz 1e1f056053 Changed the subclass of lxml.html.clean.Cleaner() to adapt to changes in the superclass in v4.5.2
- Legacy-Id: 18146
2020-07-11 20:20:50 +00:00
Henrik Levkowetz 726fcbf27d Removed all __future__ imports.
- Legacy-Id: 17391
2020-03-05 23:53:42 +00:00
Henrik Levkowetz fa9427769a Added cleaning of the session request form's 'comments' field, to convert any html entered to text. Related to [17322].
- Legacy-Id: 17324
Note: SVN reference [17322] has been migrated to Git commit eb88abc394
2020-02-21 21:36:18 +00:00
Henrik Levkowetz 8c6eb3a30a Python2/3 compatibility: Changed the use of open() and StringIO to io.open() etc.
- Legacy-Id: 16458
2019-07-15 19:14:04 +00:00
Henrik Levkowetz f481f5c3e6 Replaced use of six with the equivalent pure python3 constructs.
- Legacy-Id: 16428
2019-07-08 10:43:47 +00:00
Henrik Levkowetz d7f5c84182 Initial 2to3 patch with added copyright statement updates.
- Legacy-Id: 16309
2019-06-27 14:40:54 +00:00
Henrik Levkowetz dbfdb94c34 Merged in [15267] from rcross@amsl.com:
Fix issue with decorator on utils.html.remove_tags().
 - Legacy-Id: 15270
Note: SVN reference [15267] has been migrated to Git commit 0d255f7d0874f01163f292568e76fa9d830a54e2
2018-06-19 21:23:35 +00:00
Henrik Levkowetz 9341f96832 Tweaked the document sanitizer to insert a charset meta tag after sanitization.
- Legacy-Id: 14832
2018-03-16 11:13:03 +00:00
Henrik Levkowetz 428c451692 Added a missing tag to the sanitizer whitelist (telling lxml's Cleaner to not clean style with style=False is apparently not always enough). Fixes issue #2470.
- Legacy-Id: 14794
2018-03-14 18:52:11 +00:00
Henrik Levkowetz 2b52919c5e Added sanitize_document() and replaced sanitize_html() with sanitize_fragment() in utils.html
- Legacy-Id: 14776
2018-03-13 13:21:41 +00:00
Henrik Levkowetz 724f1ceccc Added xmpp as an acceptable protocol in links when sanitizing.
- Legacy-Id: 14766
2018-03-12 13:17:41 +00:00
Henrik Levkowetz 2fd344f810 Tweaks to handle text types better and make set operation clearer.
- Legacy-Id: 14745
2018-03-07 21:10:47 +00:00
Henrik Levkowetz 802f201d81 Modified the sanitizer and upload handler to strip also the content of some tags, and to produce valid files (if the content is otherwise valid).
- Legacy-Id: 14744
2018-03-07 19:00:24 +00:00
Henrik Levkowetz 5964cdd880 Removed unused data.
- Legacy-Id: 14741
2018-03-07 08:24:43 +00:00
Henrik Levkowetz 2828683cee Replaced html sanitization code that called html5lib directly with calls to bleach, and upgraded the requirements to let us use the latest html5lib and bleach.
- Legacy-Id: 14739
2018-03-06 18:35:34 +00:00
Henrik Levkowetz b92ad2f992 Added sanitization of uploaded html content for session agendas and minutes, and did some refactoring of the upload form classes.
- Legacy-Id: 14738
2018-03-06 15:55:30 +00:00
Henrik Levkowetz 5638cf3da3 Changed all usage of ForeignKey and OneToOneFiled in model.py files to the compatibility versions from ietf.utils.models.
- Legacy-Id: 14661
2018-02-20 15:36:05 +00:00
Henrik Levkowetz 8930d29a8e Merged in Django-1.10 upgrade work from ^/personal/henrik/6.43.1-django-1.10
- Legacy-Id: 12881
2017-02-19 18:18:00 +00:00
Henrik Levkowetz c344a18bdf Fixed an issue with the test-crawler which could cause false positives for urls containing apostrophe.
- Legacy-Id: 12851
2017-02-16 09:58:34 +00:00
Henrik Levkowetz aa5e61d958 Updated all urlpatterns to use ietf.utils.urls.url() instead of django's,
in order to autogenerate dotted path url pattern names.  Updated a number
of url reverses to use dotted path, and removed explicit url pattern names
as needed.

Changed some imports to prevent import of ietf.urls before django
initialization was complete.


Changed 3 cases of form classes being curried to functions; django 1.10
didn't accept that.

Started converting old-style middleware classes to new-style middleware
functions (incomplete).

Tweaked a nomcom decorator to preserve function names and attributes, like
a good decorator should.

Replaced the removed django templatetag 'removetags' with our own version
which uses bleach, and does sanitizing in addition to removing explicitly
mentionied html tags.

Rewrote the filename argument handling in a management command which had
broken with the upgrade.
 - Legacy-Id: 12818
2017-02-11 14:43:01 +00:00
Henrik Levkowetz 6055215ab2 Removed local copy of html5lib, added html5lib to requirements.txt, and updated utils/html.py to work with htm5lib 0.999.
- Legacy-Id: 9547
2015-04-24 18:07:26 +00:00
Henrik Levkowetz 266b7820d0 Merged from log:branch/2.00@2363: Current release branch head to trunk.
- Legacy-Id: 2365
2010-07-21 12:48:05 +00:00