在scrapy提供的shell中测试xpath

scrapy的Rule定义了从html中取url的规则，但是这些url是被自动提取，也无法打印，如果xpath有误很难调试。下面提供一种方法可以在scrapy的shell中测试LinkExtractor的xpath的正确性

1.scrapy shell 'url'
2.from scrapy.contrib.linkextractors import LinkExtractor
3.item= LinkExtractor(allow=('***'),restrict_xpaths=('***')).extract_links(response)
4.for i in item:
      print i.text

这样就可以打印出从response中提取的url了，注意得到的item是一个list，所以要循环遍历