]>
Commit | Line | Data |
---|---|---|
ac4d1142 NL |
1 | title: //div[@class="bodyText"]/h1/text()\r |
2 | body: //div[@class="bodyText"]\r | |
3 | \r | |
4 | # author and date are separated by only a newline\r | |
5 | # can't figure out how to tokenize that yet\r | |
6 | author: //div[@class="bodyText"]/span[@class="info"]/text()\r | |
7 | date: //div[@class="bodyText"]/span[@class="info"]/text()\r | |
8 | \r | |
9 | # strip metdata from body text\r | |
10 | strip: //div[@class="bodyText"]/h1/text()\r | |
11 | strip: //div[@class="bodyText"]/span[@class="info"]\r | |
12 | strip: //div[@class="bodyText"]/span[@class="info"] | |
13 | test_url: http://www.wmnf.org/news_stories/light-rail-advocates-join-forces-to-combat-opposition-in-pinellas |