Commit graph

24 commits

Author SHA1 Message Date
Henrik Levkowetz cd030d3b43 Adding copyright notices to all python files
- Legacy-Id: 716
2007-06-27 21:16:34 +00:00
Henrik Levkowetz de9a7ddbc4 Added the ability to give fill and pre(formatted) switches to the soup2text command
- Legacy-Id: 403
2007-06-15 13:28:12 +00:00
Henrik Levkowetz 754ba193ca A small script to run a diff against the master for one single django URL specified in any of the testurl.list files. Uses environment variable DJANGO_SERVER if set, or http://merlot.tools.ietf.org:31415/ otherwise.
- Legacy-Id: 375
2007-06-13 17:26:04 +00:00
Henrik Levkowetz e2db0d869d Compact spaces after \n conversion in soup2html.
- Legacy-Id: 351
2007-06-12 22:46:30 +00:00
Henrik Levkowetz aa68d30e85 Tweaking the paragraph filling code some more
- Legacy-Id: 346
2007-06-12 20:31:28 +00:00
Henrik Levkowetz 712cd8aa17 Tweak to again avoid space at the beginning of a paragraph.
- Legacy-Id: 345
2007-06-12 20:23:09 +00:00
Henrik Levkowetz 890b8a1ada Fix potential exception in soup2html again.
- Legacy-Id: 341
2007-06-12 18:34:26 +00:00
Henrik Levkowetz 6b7137994a Fix potential exception in soup2html.
- Legacy-Id: 340
2007-06-12 18:12:19 +00:00
Henrik Levkowetz dd37257c0c Only print the first 100 lines of a long diff. New soup2html code for spacing associated with certain tags.
- Legacy-Id: 337
2007-06-12 17:52:07 +00:00
Henrik Levkowetz aba06af322 Another soup2html() tweak to better avoid indentation at paragraph start.
- Legacy-Id: 330
2007-06-12 01:32:05 +00:00
Henrik Levkowetz 541b041cdc soup2html() tweak to better avoid indentation at paragraph start.
- Legacy-Id: 329
2007-06-12 00:55:41 +00:00
Henrik Levkowetz 67eb998901 soup2html() tweak to handle html comments.
- Legacy-Id: 328
2007-06-12 00:37:16 +00:00
Henrik Levkowetz b15c02c830 soup2html() tweak to handle table cells.
- Legacy-Id: 326
2007-06-12 00:25:45 +00:00
Henrik Levkowetz bfcb0e6c78 Two soup2text tweaks.
- Legacy-Id: 324
2007-06-11 23:52:51 +00:00
Henrik Levkowetz 1cafcf3e9d Changed approach to space normalization in soup2text(). Plain whitespace stripping followed by reassembly caused too large information loss. Accompanying changes in generic diff files.
- Legacy-Id: 321
2007-06-11 20:28:19 +00:00
Henrik Levkowetz 8e8c3ff5e2 * ietf/tests.py: Remove filetime() again -- not using it.
* ietf/utils/soup2text.py: Do line ending normalization.
 - Legacy-Id: 315
2007-06-11 17:26:59 +00:00
Henrik Levkowetz 7f512b4889 make soup2text convert numeric character codes (e.g., "'") too.
- Legacy-Id: 306
2007-06-11 07:47:56 +00:00
Henrik Levkowetz 0452fca7d2 * ietf/tests.py, in reduce(): add ad-hoc fix for pathologic case of not
closing <li> tags.  BeautifulSoup can handle it, but the recursive text
   rendering code in soup2text recurses too deeply with a sufficiently long
   list...
 * ietf/tests.py, in setUp(): grab the right tuple element when extracting
   the URLs from the url test tuples
 * ietf/tests.py, in read_testurls(): close opened file
 * ietf/tests.py, in doUrlsTest(): narrower try/except clause, and a new one
 * soup2text.py, in para(): undo previous change
 - Legacy-Id: 304
2007-06-11 06:13:29 +00:00
Henrik Levkowetz b42e0728c8 Accept both testurl.list and testurls.list as test url list file names. Output status for good compares, too.
- Legacy-Id: 303
2007-06-11 04:43:22 +00:00
Henrik Levkowetz 9b78963547 Fix occasional bad sentence end merges in ietf/utils/soup2text.py.
Remove some now unneded exceptions from ietf/testurl.list
 - Legacy-Id: 302
2007-06-11 04:22:29 +00:00
Henrik Levkowetz a7a6d956af Adding a fix in soup2text for a common pathological case: <br><br> used instead
of <p /> to indicate paragraph breaks.

This changes the failed diff for /iesg/telechat/detail/354/ to show only three
differences, where two are whitespace differences and one shows a difference
between '@ietf.org. The' and '@ietf.org . The' and is an artifact of the text
extraction.  Will look at fixing that next.
 - Legacy-Id: 300
2007-06-11 03:36:08 +00:00
Henrik Levkowetz 7c60b321cd Add BeautifulSoup.py to the ietf/contrib/ directory so it doesn't have to be installed separately
- Legacy-Id: 289
2007-06-10 14:02:11 +00:00
Henrik Levkowetz 06eae09af4 Removing unused imports from ietf/tests.py. Using the right Exception type in soup2html.
- Legacy-Id: 283
2007-06-10 11:43:19 +00:00
Henrik Levkowetz 10ce0e07dd 'soup2text' is a html-to-text converter which uses the BeautifulSoup.py module. It converts html to plain paragraph-filled readable text.
- Legacy-Id: 277
2007-06-10 11:27:02 +00:00