]> git.immae.eu Git - github/wallabag/wallabag.git/blame - inc/3rdparty/site_config/standard/zeit.de.txt
update zeit.de.txt for removal of inline ads
[github/wallabag/wallabag.git] / inc / 3rdparty / site_config / standard / zeit.de.txt
CommitLineData
76b1e0ba 1# 2014-10-21 [Marmo] added stripping of inline ads and appropriate test_url
4e067cea
NL
2# 2013.10.30 [rezor92] fixed single_page_link
3# 2012-12-23 [carlo@...] fixed half-assed headlines in articles, removed inline author profiles, adjusted picture captions
4# 2012-03-17 [dkless@...] Cut metadata parts in the beginning and the ends of the content block; copyright entries for pictures removed; Author fixed, not sure if old entries still valid (I left them); Weird problems with some pages addressed (see last section for removing hidden section)
5# 2011-12-09 [carlo@...] Removed "related articles" block
6# 2011-08-23 [carlo@...] changed single page link to use print version: page works better, less ambiguity. Related cleanups and simplifications.
7# 2011-08-20 [carlo@...] added author, fixed date
8
9
10single_page_link: //a[@title='Auf einer Seite']
11tidy: no
12
13title: //title
14date: substring-before( //li[@class="date"], " " )
15author: //li[@class="author"]/a/text() | //li[@class="author first"]/a/text()
16author: substring-after(//li[@class='source first '], 'Quelle: ')
17
18strip_id_or_class: articleheader
19strip: //div[@id="comments"] | //div[@class="pagination block"] | //p[@class="ressortbacklink"] | //div[@id="relatedArticles"] | // div[@class="inline portrait"]
76b1e0ba
M
20#Remove inline ads
21strip: //div[@class="innerad"]
4e067cea
NL
22
23#Removes author and date from the start
24strip: //ul[@class="tools"]
25#Removes copyright statement - often disturb as first line of the news
26strip: //p[@class="copyright"]
27strip: //div[@class="copyright"]
28#Removes pagination links at the end
29strip: //div[@class="pagination"]
30
31# Fix picture captions
32wrap_in(small): //p[@class="caption"]/text()
33
34# Fix sub-headlines
35wrap_in(h2): //p/strong
36dissolve: //h2/strong
37
38#Sometimes things are embedded in the print version that are not displayed on the web, but will be displayed in the mobilized versions and lead even to problems. These sections are removed here.
39strip_id_or_class:"informatives"
40strip_id_or_class:"bottom"
41strip_id_or_class:"teasermosaic"
42strip_id_or_class:"comments"
43strip_id_or_class:"articlefooter af"
44strip_id_or_class:"relateds"
45strip_id_or_class:"pagination"
46
47footnotes: no
48test_url: http://www.zeit.de/kultur/film/2012-12/Kurzfilmtag
76b1e0ba 49test_url: http://www.zeit.de/wissen/2014-10/ebola-nigeria-who