]> git.immae.eu Git - github/wallabag/wallabag.git/blame - inc/3rdparty/site_config/standard/zeit.de.txt
update config from @fivefilters
[github/wallabag/wallabag.git] / inc / 3rdparty / site_config / standard / zeit.de.txt
CommitLineData
3bb6a8ed 1# 2015.07.08 [Marvin Dickhaus] fixed single_page_link
4e067cea
NL
2# 2013.10.30 [rezor92] fixed single_page_link
3# 2012-12-23 [carlo@...] fixed half-assed headlines in articles, removed inline author profiles, adjusted picture captions
4# 2012-03-17 [dkless@...] Cut metadata parts in the beginning and the ends of the content block; copyright entries for pictures removed; Author fixed, not sure if old entries still valid (I left them); Weird problems with some pages addressed (see last section for removing hidden section)
5# 2011-12-09 [carlo@...] Removed "related articles" block
6# 2011-08-23 [carlo@...] changed single page link to use print version: page works better, less ambiguity. Related cleanups and simplifications.
7# 2011-08-20 [carlo@...] added author, fixed date
8
3bb6a8ed 9single_page_link: //a[contains(@href, 'komplettansicht')]
4e067cea
NL
10tidy: no
11
12title: //title
13date: substring-before( //li[@class="date"], " " )
14author: //li[@class="author"]/a/text() | //li[@class="author first"]/a/text()
15author: substring-after(//li[@class='source first '], 'Quelle: ')
16
17strip_id_or_class: articleheader
18strip: //div[@id="comments"] | //div[@class="pagination block"] | //p[@class="ressortbacklink"] | //div[@id="relatedArticles"] | // div[@class="inline portrait"]
19
20#Removes author and date from the start
21strip: //ul[@class="tools"]
22#Removes copyright statement - often disturb as first line of the news
23strip: //p[@class="copyright"]
24strip: //div[@class="copyright"]
25#Removes pagination links at the end
26strip: //div[@class="pagination"]
3bb6a8ed
NL
27#Removes link to main page at the bottom of some articles (Zur Startseite)
28strip: //a[@href='http://www.zeit.de']
4e067cea
NL
29
30# Fix picture captions
31wrap_in(small): //p[@class="caption"]/text()
32
33# Fix sub-headlines
34wrap_in(h2): //p/strong
35dissolve: //h2/strong
36
37#Sometimes things are embedded in the print version that are not displayed on the web, but will be displayed in the mobilized versions and lead even to problems. These sections are removed here.
38strip_id_or_class:"informatives"
39strip_id_or_class:"bottom"
40strip_id_or_class:"teasermosaic"
41strip_id_or_class:"comments"
42strip_id_or_class:"articlefooter af"
43strip_id_or_class:"relateds"
44strip_id_or_class:"pagination"
45
46footnotes: no
47test_url: http://www.zeit.de/kultur/film/2012-12/Kurzfilmtag
3bb6a8ed 48test_url: http://www.zeit.de/kultur/2015-07/kapitalismuskritik-selbstberuhigung-armin-nassehi