]>
Commit | Line | Data |
---|---|---|
1 | author: //p[@class='mastname']\r | |
2 | \r | |
3 | body: //div[@class='indivbody']\r | |
4 | date: //div[@class='indivbody']/h2[1]\r | |
5 | \r | |
6 | # Remove blog title. Specify first occurrence in case h1 is used in article\r | |
7 | strip: //div[@class='indivbody']/h1[1]\r | |
8 | \r | |
9 | # Remove blog description (the first p element)\r | |
10 | strip: //div[@class='indivbody']/p[1]\r | |
11 | \r | |
12 | # Remove navigation (second p element)\r | |
13 | strip: //div[@class='indivbody']/p[2]\r | |
14 | \r | |
15 | # Remove duplicate of article title. Specify first occurrence in case h3 is used in article\r | |
16 | strip: //div[@class='indivbody']/h3[1]\r | |
17 | \r | |
18 | # Remove publishing date, it's extracted by rule above\r | |
19 | strip: //div[@class='indivbody']/h2[1]\r | |
20 | \r | |
21 | # Remove duplicate of date at end, and newsletter signup\r | |
22 | strip: //p[@class='posted']\r | |
23 | \r | |
24 | # Leave date at top\r | |
25 | test_url: http://www.schneier.com/blog/archives/2010/12/security_in_202.html |