python - How to extract and download all images from a website using beautifulSoup? -
i trying extract , download images url. wrote script
import urllib2 import re os.path import basename urlparse import urlsplit url = "http://filmygyan.in/katrina-kaifs-top-10-cutest-pics-gallery/" urlcontent = urllib2.urlopen(url).read() # html image tag: <img src="url" alt="some_text"/> imgurls = re.findall('img .*?src="(.*?)"', urlcontent) # download images imgurl in imgurls: try: imgdata = urllib2.urlopen(imgurl).read() filename = basename(urlsplit(imgurl)[2]) output = open(filename,'wb') output.write(imgdata) output.close() except: pass
i don't want extract image of page see image http://i.share.pho.to/1c9884b1_l.jpeg want images without clicking on "next" button not getting how can pics within "next" class.?what changes should in findall?
if want pictures can download them without scrapping webpage. have same url:
http://filmygyan.in/wp-content/gallery/katrina-kaifs-top-10-cutest-pics-gallery/cute1.jpg http://filmygyan.in/wp-content/gallery/katrina-kaifs-top-10-cutest-pics-gallery/cute2.jpg ... http://filmygyan.in/wp-content/gallery/katrina-kaifs-top-10-cutest-pics-gallery/cute10.jpg
so simple code give images:
import os import urllib import urllib2 baseurl = "http://filmygyan.in/wp-content/gallery/katrina-kaifs-top-10-"\ "cutest-pics-gallery/cute%s.jpg" in range(1,11): url = baseurl % urllib.urlretrieve(url, os.path.basename(url))
with beautifulsoup have click or go next page scrap images. if want ot scrap each page individually try scrathem using there class shutterset_katrina-kaifs-top-10-cutest-pics-gallery
Comments
Post a Comment