Jennifer Richards
e91bda7e5e
feat: consolidate HTML sanitizing ( #8471 )
...
* refactor: isolate bleach code
* refactor: move html fns to html.py
* refactor: lose the bleach.py module; refactor
* refactor: sanitize_document -> clean_html
Drops <meta charset="utf-8"> addition after cleaning.
* fix: disambiguate import
* feat: restore <meta charset="utf-8"> tag
* chore: comments
* chore(deps): drop lxml_html_clean package
* refactor: on second thought, no meta charset
* refactor: sanitize_fragment -> clean_html
* test: remove check for charset
* chore: fix lint
2025-01-28 11:28:19 -06:00
Lars Eggert
0038151bf9
fix: Fix removetags ( #4226 )
...
I don't think this ever worked.
2022-07-18 09:39:11 -05:00
Lars Eggert
fd087d4e16
fix: Avoid crashes in urlize_ietf_docs
( #4161 )
...
* fix: Don't crash when urlreverse fails as part of urlize_ietf_docs
Also fix an HTMLization nit.
* Fix more corner cases found during test-crawl
* Handle "I-D.*"" reference-style matches
* Refactor use of bleach. Better Markdown linkification and formatting.
* Address review comment from @rjsparks
2022-07-07 12:27:30 -05:00
Lars Eggert
a5cbf5307e
More fixes
...
- Legacy-Id: 19835
2022-01-12 11:54:00 +00:00
Henrik Levkowetz
1e1f056053
Changed the subclass of lxml.html.clean.Cleaner() to adapt to changes in the superclass in v4.5.2
...
- Legacy-Id: 18146
2020-07-11 20:20:50 +00:00
Henrik Levkowetz
726fcbf27d
Removed all __future__ imports.
...
- Legacy-Id: 17391
2020-03-05 23:53:42 +00:00
Henrik Levkowetz
fa9427769a
Added cleaning of the session request form's 'comments' field, to convert any html entered to text. Related to [17322].
...
- Legacy-Id: 17324
Note: SVN reference [17322] has been migrated to Git commit eb88abc394
2020-02-21 21:36:18 +00:00
Henrik Levkowetz
8c6eb3a30a
Python2/3 compatibility: Changed the use of open() and StringIO to io.open() etc.
...
- Legacy-Id: 16458
2019-07-15 19:14:04 +00:00
Henrik Levkowetz
f481f5c3e6
Replaced use of six with the equivalent pure python3 constructs.
...
- Legacy-Id: 16428
2019-07-08 10:43:47 +00:00
Henrik Levkowetz
d7f5c84182
Initial 2to3 patch with added copyright statement updates.
...
- Legacy-Id: 16309
2019-06-27 14:40:54 +00:00
Henrik Levkowetz
dbfdb94c34
Merged in [15267] from rcross@amsl.com:
...
Fix issue with decorator on utils.html.remove_tags().
- Legacy-Id: 15270
Note: SVN reference [15267] has been migrated to Git commit 0d255f7d0874f01163f292568e76fa9d830a54e2
2018-06-19 21:23:35 +00:00
Henrik Levkowetz
9341f96832
Tweaked the document sanitizer to insert a charset meta tag after sanitization.
...
- Legacy-Id: 14832
2018-03-16 11:13:03 +00:00
Henrik Levkowetz
428c451692
Added a missing tag to the sanitizer whitelist (telling lxml's Cleaner to not clean style with style=False is apparently not always enough). Fixes issue #2470 .
...
- Legacy-Id: 14794
2018-03-14 18:52:11 +00:00
Henrik Levkowetz
2b52919c5e
Added sanitize_document() and replaced sanitize_html() with sanitize_fragment() in utils.html
...
- Legacy-Id: 14776
2018-03-13 13:21:41 +00:00
Henrik Levkowetz
724f1ceccc
Added xmpp as an acceptable protocol in links when sanitizing.
...
- Legacy-Id: 14766
2018-03-12 13:17:41 +00:00
Henrik Levkowetz
2fd344f810
Tweaks to handle text types better and make set operation clearer.
...
- Legacy-Id: 14745
2018-03-07 21:10:47 +00:00
Henrik Levkowetz
802f201d81
Modified the sanitizer and upload handler to strip also the content of some tags, and to produce valid files (if the content is otherwise valid).
...
- Legacy-Id: 14744
2018-03-07 19:00:24 +00:00
Henrik Levkowetz
5964cdd880
Removed unused data.
...
- Legacy-Id: 14741
2018-03-07 08:24:43 +00:00
Henrik Levkowetz
2828683cee
Replaced html sanitization code that called html5lib directly with calls to bleach, and upgraded the requirements to let us use the latest html5lib and bleach.
...
- Legacy-Id: 14739
2018-03-06 18:35:34 +00:00
Henrik Levkowetz
b92ad2f992
Added sanitization of uploaded html content for session agendas and minutes, and did some refactoring of the upload form classes.
...
- Legacy-Id: 14738
2018-03-06 15:55:30 +00:00
Henrik Levkowetz
5638cf3da3
Changed all usage of ForeignKey and OneToOneFiled in model.py files to the compatibility versions from ietf.utils.models.
...
- Legacy-Id: 14661
2018-02-20 15:36:05 +00:00
Henrik Levkowetz
8930d29a8e
Merged in Django-1.10 upgrade work from ^/personal/henrik/6.43.1-django-1.10
...
- Legacy-Id: 12881
2017-02-19 18:18:00 +00:00
Henrik Levkowetz
c344a18bdf
Fixed an issue with the test-crawler which could cause false positives for urls containing apostrophe.
...
- Legacy-Id: 12851
2017-02-16 09:58:34 +00:00
Henrik Levkowetz
aa5e61d958
Updated all urlpatterns to use ietf.utils.urls.url() instead of django's,
...
in order to autogenerate dotted path url pattern names. Updated a number
of url reverses to use dotted path, and removed explicit url pattern names
as needed.
Changed some imports to prevent import of ietf.urls before django
initialization was complete.
Changed 3 cases of form classes being curried to functions; django 1.10
didn't accept that.
Started converting old-style middleware classes to new-style middleware
functions (incomplete).
Tweaked a nomcom decorator to preserve function names and attributes, like
a good decorator should.
Replaced the removed django templatetag 'removetags' with our own version
which uses bleach, and does sanitizing in addition to removing explicitly
mentionied html tags.
Rewrote the filename argument handling in a management command which had
broken with the upgrade.
- Legacy-Id: 12818
2017-02-11 14:43:01 +00:00
Henrik Levkowetz
6055215ab2
Removed local copy of html5lib, added html5lib to requirements.txt, and updated utils/html.py to work with htm5lib 0.999.
...
- Legacy-Id: 9547
2015-04-24 18:07:26 +00:00
Henrik Levkowetz
266b7820d0
Merged from log:branch/2.00@2363: Current release branch head to trunk.
...
- Legacy-Id: 2365
2010-07-21 12:48:05 +00:00