]> git.immae.eu Git - github/wallabag/wallabag.git/blame_incremental - inc/3rdparty/site_config/standard/sportsillustrated.cnn.com.txt
updated specific configuration for parsing
[github/wallabag/wallabag.git] / inc / 3rdparty / site_config / standard / sportsillustrated.cnn.com.txt
... / ...
CommitLineData
1# main sportsillustrated.com articles
2#
3body: //div[@id="cnnStoryContent"]
4title: //div[@id="cnnStoryHeadline"]//h1
5author: //div[@id="cnnSubBanner"]//strong
6date: substring-after(//div[@id="cnnTimeStamp"], "Updated: ")
7date: substring-after(//div[@id="cnnTimeStamp"], "Posted: ")
8
9# kill ugly font buttons
10strip: //div[@id="cnnSCFontButtons"]
11
12# kill misc filler videos & etc
13strip: //div[@class="cnnDivideContent"]
14strip: //*[@class="cnnTMbox"]
15
16# si vault articles
17# -------------
18body: //div[@class="siv_artPara"]
19title: //div[@class="siv_artHeader"]//h1
20author: //div[@class="byline"]
21date: //div[@class="date"]
22
23next_page_link: //div[@id='cnnStoryContinue']/a
24strip_id_or_class: cnnstorypagination
25
26test_url: http://sportsillustrated.cnn.com/2012/writers/peter_king/02/27/combine/index.html