diff options
author | Benjamin Franzke <benjaminfranzke@googlemail.com> | 2012-06-28 10:09:28 +0200 |
---|---|---|
committer | Benjamin Franzke <benjaminfranzke@googlemail.com> | 2012-06-28 10:09:28 +0200 |
commit | 36c0f2f4d8aef72776a337eef05bec8cd0360e83 (patch) | |
tree | 0751342b278da2a5a67226b7450f046b50b279de /convert.sed | |
download | bible-fetch-36c0f2f4d8aef72776a337eef05bec8cd0360e83.tar.gz bible-fetch-36c0f2f4d8aef72776a337eef05bec8cd0360e83.tar.bz2 bible-fetch-36c0f2f4d8aef72776a337eef05bec8cd0360e83.zip |
Add scripts to download elberfelder from die-bibel.de
That is download is shell scripts using curl, parse books
and chapters with sed. Then prepare html with sed to be converted
to zefania xml using a xsl stylesheet.
Diffstat (limited to 'convert.sed')
-rwxr-xr-x | convert.sed | 31 |
1 files changed, 31 insertions, 0 deletions
diff --git a/convert.sed b/convert.sed new file mode 100755 index 0000000..2449d1f --- /dev/null +++ b/convert.sed @@ -0,0 +1,31 @@ +#!/bin/sed -f + +/data-href/s/&/&/g + +# xsltproc --html doesnt understand html5 +s/section/div/g +s/header/h1/g +s/<nav/<div/g +s:</nav:</div:g +s/footer/div/g +s/article/div/g + +# Fix incorrect < and > inside p tags, that is by allowing only +# known tag be surrounded by < and >. +ta +:a +s/<p>\(.*\)<\/p>/\1/ +tfix +b + +:fix +s/</\</g +s/>/\>/g + +s/<span\([^;]*\)>/<span\1>/g +s/<\/span>/<\/span>/g + +s/\<em\>/<em>/g +s/\<\/em\>/<\/em>/g + +s:.*:<p>&</p>: |