]> git.immae.eu Git - github/wallabag/wallabag.git/commitdiff
Add support for *.about.com 754/head
authorSiôn Le Roux <sinisterstuf@gmail.com>
Thu, 10 Jul 2014 16:30:44 +0000 (18:30 +0200)
committerSiôn Le Roux <sinisterstuf@gmail.com>
Thu, 10 Jul 2014 22:04:24 +0000 (00:04 +0200)
Includes next_page_link for multi-page articles and strips pesky in-line
'next' links from the article body. Also includes an Xpath for author
but I can't see where this is used in the wallabag UI.

The 'tidy' option is turned off because it messed up bulleted lists.

Tested with psychology.about.com and food.about.com.

inc/3rdparty/site_config/standard/.about.com.txt [new file with mode: 0644]

diff --git a/inc/3rdparty/site_config/standard/.about.com.txt b/inc/3rdparty/site_config/standard/.about.com.txt
new file mode 100644 (file)
index 0000000..e1ebaee
--- /dev/null
@@ -0,0 +1,14 @@
+body: //div[@id='articlebody']
+title: //h1
+author: //p[@id='by']//a
+
+next_page_link: //span[@class='next']/a
+# Not the same as below!
+
+prune: yes
+tidy: no
+
+# Annoying 'next' links plainly inside the article body
+strip: //*[text()[contains(.,'Next: ')]]
+
+test_url: http://psychology.about.com/od/theoriesofpersonality/ss/defensemech.htm