How to scrape data from infinite scroll website using scrapy

0 votes

    from scrapy.spiders import Spider
    from ..items import QtItem


    class QuoteSpider(Spider):
     name = 'acres'
      start_urls = ['https://housing.com/in/buy/searches/AB1AC0M1P4hkd3fsj8fd9kanb']

    def parse(self, response):
        items = QtItem()

        all_div_names = response.xpath('//article')

        for bks in all_div_names:
            name = bks.xpath('//span[@class="css-fwbz9r"]/text()').extract()
            price = bks.xpath('//h2[@class="css-yr18fa"]/text()').extract()
            sqft = bks.xpath('//*[contains(concat( " ", @class, " " ), concat( " ", "css-ebj250", " " )) and (((count(preceding-sibling::*) + 1) = 1) and parent::*)]//*[contains(concat( " ", @class, " " ), concat( " ", "css-1ty8tu4", " " ))]/text()').extract()
            bhk = bks.xpath('//a[@class="css-163eyf0"]/text()').extract()

        items['ttname'] = name
        items['ttprice'] = price
        items['ttsqft'] = sqft
        items['ttbhk'] = bhk

        yield items

I am new to scrapy. I am using above given spider to scrape data from the website mentioned in the code. The website mentioned in code is dynamic and is having infinite scrolling. I want to scrape data from whole website but it only gives me first 20 values. I have successfully managed to scrape those 20 values data in the desired manner, but unable to scrape rest 4000(approx.) as it is having infinite scrolling.

The output I get is : 
    
     {'ttbhk': ['3 BHK Apartment',
           '2 BHK Apartment',
           '2 BHK Apartment',
           '4 BHK Apartment',
           '3 BHK Apartment',
           '3 BHK Apartment',
           '4 BHK Apartment',
           '2 BHK Apartment',
           '2 BHK Apartment',
           '4 BHK Apartment',
           '3 BHK Apartment',
           '3 BHK Apartment',
           '3 BHK Apartment',
           '3 BHK Apartment',
           '3 BHK Apartment',
           '3 BHK Apartment',
           '2 BHK Apartment',
           '2 BHK Apartment',
           '2 BHK Apartment',
           '3 BHK Apartment'],
    'ttname': ['Jodhpur Village, Jodhpur, Ahmedabad',
            'Shapers Swastik Platinum, Narolgam, Ahmedabad',
            'Gayatri Maitri Lake View, Zundal, Ahmedabad',
            'Aariyana Lakeside, Shilaj, Ahmedabad',
            'Maruti Zenobia, Bodakdev, Ahmedabad',
            'arjun greens, Naranpura, Ahmedabad',
            'Goyal Riviera Blues, Makarba, Ahmedabad',
            'Ganesh Malabar County II, Chharodi, Ahmedabad',
            'Jodhpur Village, Jodhpur, Ahmedabad',
            'Ratna Paradise, Khoraj, Ahmedabad',
            'Siddhi Aarohi Elysium, Bopal, Ahmedabad',
            'Pacifica La Habitat, Thaltej, Ahmedabad',
            'Sthapatya Pratham Lakeview, Science City, Ahmedabad',
            'Adi Heritage Skyz , Prahlad Nagar, Ahmedabad',
            'Maple Tree, Memnagar, Ahmedabad',
            'Goyal Plaza, Satellite, Ahmedabad',
            'C.P. Nagar-1, Ghatlodiya, Ahmedabad',
            'Arvind & Safal Parishkaar Apartments, Amraiwadi, Ahmedabad',
            'Ganesh Malabar County, Chharodi, Ahmedabad',
            'Binori Solitaire, Bopal, Ahmedabad'],
     'ttprice': ['₹95.0 L',
             '₹17.0 L',
             '₹28.75 L',
             '₹3.5 Cr',
             '₹1.35 Cr',
             '₹1.0 Cr',
             '₹1.9 Cr',
             '₹43.0 L',
             '₹47.5 L',
             '₹1.55 Cr',
             '₹64.0 L',
             '₹1.0 Cr',
             '₹1.09 Cr',
             '₹1.4 Cr',
             '₹1.55 Cr',
             '₹1.0 Cr',
             '₹40.0 L',
             '₹42.0 L',
             '₹47.0 L',
             '₹1.1 Cr'],
      'ttsqft': ['870 sq.ft',
            '1125 sq.ft',
            '4275 sq.ft',
            '1755 sq.ft',
            '1812 sq.ft',
            '2750 sq.ft',
            '1170 sq.ft',
            '1200 sq.ft',
            '3340 sq.ft',
            '1435 sq.ft',
            '1961 sq.ft',
            '1890 sq.ft',
            '2040 sq.ft',
            '2400 sq.ft',
            '1685 sq.ft',
            '900 sq.ft',
            '1108 sq.ft',
            '1168 sq.ft',
            '2214 sq.ft']}

your help will be really appreciated. 
Jul 17, 2020 in Python by anonymous
• 120 points
2,403 views

1 answer to this question.

0 votes

Hello, @Detrod,

Regarding your query, I guess you have to come up with few more hacks for which I would suggest you go through these two below links: https://medium.com/@harshvb7/scraping-from-a-website-with-infinite-scrolling-7e080ea8768e

https://www.accordbox.com/blog/how-crawl-infinite-scrolling-pages-using-python/

I hope this will help you.

answered Jul 17, 2020 by Rashmi

Related Questions In Python

0 votes
1 answer

How to read data from a text file using Python?

Refer to the below example where the ...READ MORE

answered May 13, 2019 in Python by Sushma
749 views
0 votes
1 answer

How to get text label from SAP using pywinauto[python]

Hi. Can you please tell me what ...READ MORE

answered Jun 28, 2018 in Python by Nietzsche's daemon
• 4,260 points
1,219 views
0 votes
1 answer

How to extract or split characters from number strings using Pandas?

You could just simply use a conversion ...READ MORE

answered Sep 18, 2018 in Python by aryya
• 7,440 points
1,059 views
0 votes
3 answers

How to get the return value from a thread using python?

FWIW, the multiprocessing module has a nice interface for ...READ MORE

answered Dec 15, 2020 in Python by Roshni
• 10,480 points
76,977 views
0 votes
1 answer

How to use read a WSDL file from the file system using Python suds?

Hi, good question. It is a very simple ...READ MORE

answered Jan 21, 2019 in Python by Nymeria
• 3,540 points
4,912 views
0 votes
1 answer
0 votes
2 answers
+1 vote
2 answers

how can i count the items in a list?

Syntax :            list. count(value) Code: colors = ['red', 'green', ...READ MORE

answered Jul 7, 2019 in Python by Neha
• 330 points

edited Jul 8, 2019 by Kalgi 2,477 views
0 votes
1 answer