title: /html/body/div/div[2]/div/div/div/h3 body: /html/body/div/div[2]/div/div/div/div[2] strip: /html/body/div/div[2]/div/div/div/div[6]/div[3]/div/div/div tidy: no # any way to get rid of this word character garbage? test_url: http://www.themillions.com/2010/07/at-the-movies-with-david-mitchell-the-thousand-autumns-of-jacob-de-zoet.html