python - Scrapy exports invalid json -


my parse looks this:

def parse(self, response):     hxs = htmlxpathselector(response)     titles = hxs.select("//tr/td")     items = []     titles in titles:         item = myitem()         item['title'] = titles.select('h3/a/text()').extract()         items.append(item)     return items 

why output json this:

[{"title": ["random title #1"]}, {"title": ["random title #2"]}] 

titles.select('h3/a/text()').extract() returns list, list. scrapy doesn't make assumptions item's structure.

the quick fix first result:

item['title'] = titles.select('h3/a/text()').extract()[0] 

a better solution use item loader , use takefirst() output processor:

from scrapy.contrib.loader import xpathitemloader scrapy.contrib.loader.processor import takefirst, mapcompose  class youritemloader(xpathitemloader):     default_item_class = youritemclass      default_input_processor = mapcompose(unicode.strip)     default_output_processor = takefirst()      # title_in = mapcompose(unicode.strip) 

and load item way:

def parse(self, response):     hxs = htmlxpathselector(response)      title in hxs.select("//tr/td"):         loader = youritemloader(selector=title, response=response)         loader.add_xpath('title', 'h3/a/text()')          yield loader.load_item() 

Comments

Popular posts from this blog

java - Run a .jar on Heroku -

java - Jtable duplicate Rows -

validation - How to pass paramaters like unix into windows batch file -