]> git.immae.eu Git - github/wallabag/wallabag.git/blob - inc/3rdparty/site_config/standard/nytimes.com.txt
merge epub with all the dev evolutions
[github/wallabag/wallabag.git] / inc / 3rdparty / site_config / standard / nytimes.com.txt
1 title://h1[@class="articleHeadline"]
2 body://div[@id="article"]
3 body://*[@itemprop="articleBody"]
4 strip_id_or_class:articleTools
5 strip_id_or_class:readerscomment
6 #strip://div[contains(@class, "articleInline runaroundLeft")]
7 strip: //div[contains(@class, "doubleRule")]
8 # strip image credit - appears as a bold heading
9 strip: //div[contains(@class, "articleInline")]//h6
10 strip_id_or_class:enlargeThis
11 strip_id_or_class:pageLinks
12 strip_id_or_class:memberTools
13 strip_id_or_class:articleExtras
14 strip_id_or_class:singleAd
15 strip_id_or_class:byline
16 strip_id_or_class:dateline
17 strip_id_or_class:articleheadline
18 strip_id_or_class:articleBottomExtra
19 strip_id_or_class:shareTools
20 strip://a[contains(@href, 'nytimes.com/adx/')]
21 strip: //nyt_byline
22 strip: //span[contains(@class, 'slideshow') or contains(@class, 'video')]
23 strip: //p[@class='caption']//a[contains(., 'More Photos')]
24
25 prune: no
26 tidy: no
27
28 find_string: <script
29 replace_string: <div style="display:none"
30 find_string: </script>
31 replace_string: </div>
32
33 date: substring-after(//*[contains(@class, 'dateline')], 'Published:')
34
35 single_page_link: //link[contains(@href, 'pagewanted=all')]
36 single_page_link: //link[@rel='alternate' and contains(@href, 'mobile.nytimes.com')]/@href
37 single_page_link: concat(substring-before(//div[@id='pageLinks']//a[contains(@href, 'pagewanted=')]/@href, 'pagewanted='), 'pagewanted=all')
38 #single_page_link: //a[contains(@href, 'pagewanted=all') and not(contains(@href, 'login'))]
39
40 strip://ul[@id = 'toolsList']
41 strip://h6[@class = 'kicker']
42 author:substring-after(//h6[@class='byline'],'By ')
43
44 test_url: http://www.nytimes.com/2011/07/24/books/review/an-academic-authors-unintentional-masterpiece.html
45 test_contains: In this column I want to look at a not uncommon way of writing
46
47 test_url: http://www.nytimes.com/2012/06/10/arts/television/the-newsroom-aaron-sorkins-return-to-tv.html
48 test_contains: IF you’ve seen enough of Aaron Sorkin’s theater
49
50 test_url: http://www.nytimes.com/2013/03/25/world/middleeast/israeli-military-responds-after-patrols-come-under-fire-from-syria.html
51 test_url: http://www.nytimes.com/2013/08/15/nyregion/when-the-new-york-city-subway-ran-without-rails.html
52 test_url: http://www.nytimes.com/2004/02/29/weekinreview/correspondence-class-consciousness-china-s-wealthy-live-creed-hobbes-darwin-meet.html
53 test_url: http://www.nytimes.com/2014/06/19/opinion/gail-collins-romney-and-the-2016-contenders-huddle.html