diff options
author | Nicolas LÅ“uillet <nicolas.loeuillet@gmail.com> | 2013-12-06 10:13:03 +0100 |
---|---|---|
committer | Nicolas LÅ“uillet <nicolas.loeuillet@gmail.com> | 2013-12-06 10:13:03 +0100 |
commit | ac4d114214d820b20e18518a2dbc809337e39043 (patch) | |
tree | 27886128ef949b7f8dd174b0646b5a4d99883b44 /inc/3rdparty/site_config/standard/fnal.gov.txt | |
parent | d5501950e2470d52f6bf5954d2179010cdee0475 (diff) | |
download | wallabag-ac4d114214d820b20e18518a2dbc809337e39043.tar.gz wallabag-ac4d114214d820b20e18518a2dbc809337e39043.tar.zst wallabag-ac4d114214d820b20e18518a2dbc809337e39043.zip |
[add] new specific configuration files
Diffstat (limited to 'inc/3rdparty/site_config/standard/fnal.gov.txt')
-rw-r--r-- | inc/3rdparty/site_config/standard/fnal.gov.txt | 15 |
1 files changed, 15 insertions, 0 deletions
diff --git a/inc/3rdparty/site_config/standard/fnal.gov.txt b/inc/3rdparty/site_config/standard/fnal.gov.txt new file mode 100644 index 00000000..7faa6bfc --- /dev/null +++ b/inc/3rdparty/site_config/standard/fnal.gov.txt | |||
@@ -0,0 +1,15 @@ | |||
1 | title: normalize(//h1) | ||
2 | |||
3 | author: //td/p[position()=last()]/em | ||
4 | |||
5 | # I swear, this is really the best way to do this | ||
6 | date: normalize(//td[contains(@style, "color: #ffffff")]) | ||
7 | |||
8 | # my god, it's full of tables | ||
9 | body: /table/tbody/tr[5]//table/tbody//table/tbody/tr/td | ||
10 | strip: //h1 | ||
11 | |||
12 | # the following two lines strip the byline at the end of the article (the byline is a <p> that consists of an em dash and then some text in an <em>). I have no idea why I can't just strip //p[position()=last()], but trying to do so includes a bunch of other crap in the output. | ||
13 | strip: //p[position()=last()]/em | ||
14 | strip: //p[position()=last()]/child::text() | ||
15 | test_url: http://www.fnal.gov/pub/today/archive_2011/today11-11-09_MuonDepartmentReadMore.html \ No newline at end of file | ||