summaryrefslogtreecommitdiff
path: root/convert.sed
diff options
context:
space:
mode:
authorBenjamin Franzke <benjaminfranzke@googlemail.com>2012-06-28 10:09:28 +0200
committerBenjamin Franzke <benjaminfranzke@googlemail.com>2012-06-28 10:09:28 +0200
commit36c0f2f4d8aef72776a337eef05bec8cd0360e83 (patch)
tree0751342b278da2a5a67226b7450f046b50b279de /convert.sed
downloadbible-fetch-36c0f2f4d8aef72776a337eef05bec8cd0360e83.tar.gz
bible-fetch-36c0f2f4d8aef72776a337eef05bec8cd0360e83.tar.bz2
bible-fetch-36c0f2f4d8aef72776a337eef05bec8cd0360e83.zip
Add scripts to download elberfelder from die-bibel.de
That is download is shell scripts using curl, parse books and chapters with sed. Then prepare html with sed to be converted to zefania xml using a xsl stylesheet.
Diffstat (limited to 'convert.sed')
-rwxr-xr-xconvert.sed31
1 files changed, 31 insertions, 0 deletions
diff --git a/convert.sed b/convert.sed
new file mode 100755
index 0000000..2449d1f
--- /dev/null
+++ b/convert.sed
@@ -0,0 +1,31 @@
+#!/bin/sed -f
+
+/data-href/s/&/&amp;/g
+
+# xsltproc --html doesnt understand html5
+s/section/div/g
+s/header/h1/g
+s/<nav/<div/g
+s:</nav:</div:g
+s/footer/div/g
+s/article/div/g
+
+# Fix incorrect < and > inside p tags, that is by allowing only
+# known tag be surrounded by < and >.
+ta
+:a
+s/<p>\(.*\)<\/p>/\1/
+tfix
+b
+
+:fix
+s/</\&lt;/g
+s/>/\&gt;/g
+
+s/&lt;span\([^;]*\)&gt;/<span\1>/g
+s/&lt;\/span&gt;/<\/span>/g
+
+s/\&lt;em\&gt;/<em>/g
+s/\&lt;\/em\&gt;/<\/em>/g
+
+s:.*:<p>&</p>: