diff options
author | Siôn Le Roux <sinisterstuf@gmail.com> | 2014-07-10 18:30:44 +0200 |
---|---|---|
committer | Siôn Le Roux <sinisterstuf@gmail.com> | 2014-07-11 00:04:24 +0200 |
commit | d59536deea443f4bdac2c5cf1bfeea690810a817 (patch) | |
tree | c53ec785f3e36bcca07c09f58fa1b496741b6304 /inc/3rdparty/site_config/standard | |
parent | 6400371ff93782d25cdbd50aa224c70145b3890a (diff) | |
download | wallabag-d59536deea443f4bdac2c5cf1bfeea690810a817.tar.gz wallabag-d59536deea443f4bdac2c5cf1bfeea690810a817.tar.zst wallabag-d59536deea443f4bdac2c5cf1bfeea690810a817.zip |
Add support for *.about.com
Includes next_page_link for multi-page articles and strips pesky in-line
'next' links from the article body. Also includes an Xpath for author
but I can't see where this is used in the wallabag UI.
The 'tidy' option is turned off because it messed up bulleted lists.
Tested with psychology.about.com and food.about.com.
Diffstat (limited to 'inc/3rdparty/site_config/standard')
-rw-r--r-- | inc/3rdparty/site_config/standard/.about.com.txt | 14 |
1 files changed, 14 insertions, 0 deletions
diff --git a/inc/3rdparty/site_config/standard/.about.com.txt b/inc/3rdparty/site_config/standard/.about.com.txt new file mode 100644 index 00000000..e1ebaee3 --- /dev/null +++ b/inc/3rdparty/site_config/standard/.about.com.txt | |||
@@ -0,0 +1,14 @@ | |||
1 | body: //div[@id='articlebody'] | ||
2 | title: //h1 | ||
3 | author: //p[@id='by']//a | ||
4 | |||
5 | next_page_link: //span[@class='next']/a | ||
6 | # Not the same as below! | ||
7 | |||
8 | prune: yes | ||
9 | tidy: no | ||
10 | |||
11 | # Annoying 'next' links plainly inside the article body | ||
12 | strip: //*[text()[contains(.,'Next: ')]] | ||
13 | |||
14 | test_url: http://psychology.about.com/od/theoriesofpersonality/ss/defensemech.htm | ||