]>
Commit | Line | Data |
---|---|---|
ac4d1142 NL |
1 | #host configuration should be http://www.neh.gov/news/humanities/\r |
2 | \r | |
3 | \r | |
4 | #meta data \r | |
5 | title:substring-after(substring-after(//title,':'),':')\r | |
6 | author:substring-after(//h2[@class = 'subHead'],'By')\r | |
7 | date:substring-before(substring-after(//title,':'),':')\r | |
8 | \r | |
9 | #img and caption handling\r | |
10 | wrap_in(small)://div[@id = 'mainContent']/table/descendant::p/descendant::text()\r | |
11 | wrap_in(fieldset)://div[@id = 'mainContent']/table\r | |
12 | \r | |
13 | # clean up\r | |
14 | strip: //table[@class = 'marginpaddingTop']\r | |
15 | strip: //h2[@class = 'subHead']\r | |
16 | ||
17 | test_url: http://www.neh.gov/news/humanities/2011-11/IslamicScholar.html |