From: SiƓn Le Roux Date: Thu, 10 Jul 2014 16:30:44 +0000 (+0200) Subject: Add support for *.about.com X-Git-Tag: 1.7.1^2~1^2 X-Git-Url: https://git.immae.eu/?a=commitdiff_plain;h=d59536deea443f4bdac2c5cf1bfeea690810a817;p=github%2Fwallabag%2Fwallabag.git Add support for *.about.com Includes next_page_link for multi-page articles and strips pesky in-line 'next' links from the article body. Also includes an Xpath for author but I can't see where this is used in the wallabag UI. The 'tidy' option is turned off because it messed up bulleted lists. Tested with psychology.about.com and food.about.com. --- diff --git a/inc/3rdparty/site_config/standard/.about.com.txt b/inc/3rdparty/site_config/standard/.about.com.txt new file mode 100644 index 00000000..e1ebaee3 --- /dev/null +++ b/inc/3rdparty/site_config/standard/.about.com.txt @@ -0,0 +1,14 @@ +body: //div[@id='articlebody'] +title: //h1 +author: //p[@id='by']//a + +next_page_link: //span[@class='next']/a +# Not the same as below! + +prune: yes +tidy: no + +# Annoying 'next' links plainly inside the article body +strip: //*[text()[contains(.,'Next: ')]] + +test_url: http://psychology.about.com/od/theoriesofpersonality/ss/defensemech.htm