python - Scrapy exports invalid json -
my parse looks this:
def parse(self, response): hxs = htmlxpathselector(response) titles = hxs.select("//tr/td") items = [] titles in titles: item = myitem() item['title'] = titles.select('h3/a/text()').extract() items.append(item) return items
why output json this:
[{"title": ["random title #1"]}, {"title": ["random title #2"]}]
titles.select('h3/a/text()').extract()
returns list, list. scrapy doesn't make assumptions item's structure.
the quick fix first result:
item['title'] = titles.select('h3/a/text()').extract()[0]
a better solution use item loader , use takefirst()
output processor:
from scrapy.contrib.loader import xpathitemloader scrapy.contrib.loader.processor import takefirst, mapcompose class youritemloader(xpathitemloader): default_item_class = youritemclass default_input_processor = mapcompose(unicode.strip) default_output_processor = takefirst() # title_in = mapcompose(unicode.strip)
and load item way:
def parse(self, response): hxs = htmlxpathselector(response) title in hxs.select("//tr/td"): loader = youritemloader(selector=title, response=response) loader.add_xpath('title', 'h3/a/text()') yield loader.load_item()
Comments
Post a Comment