Crawling after login in Python

0 votes

I am studying crawling using Python.

My goal is to download the file.

I am studying login now and it is very difficult.

http://www.kif.re.kr/kif2/login/login.aspx?menuid=56

For example, I need to log in to download files from this site.

I looked up various information.

Login to website using python

But the site I want seems a bit different.

I was able to crawl most sites that do not require login.

However, I can not crawl sites that require login.

So I really want to study that part.

My goal is to log in and then view the code in html for crawling.

Below is my code. Is this the right thing to do?

from requests import session

# ex) ID = abcd  / PW = 1234

payload = {
'ctl00$ContentPlaceHolder1$tbxLoginID' : 'abcd',
'ctl00$ContentPlaceHolder1$tbxLoginPW' : '1234'
}

with session() as c:
    c.post('http://www.kif.re.kr/kif2/login/login.aspx', data=payload)
    response = c.get('What should I write here?')
    # response = c.get('http://example.com/protected_page.php')
    print(response.headers)
    print(response.text)

Sep 14, 2018 in Python by bug_seeker
• 14,970 points
22 views

1 answer to this question.

Your answer

Your name to display (optional):
Privacy: Your email address will only be used for sending these notifications.
0 votes

You missed a few login data forms, here is how the payload should look like

payload = { 
    '__LASTFOCUS': '',#empty
    '__VIEWSTATE': 'get this value from the login page source',
    '__VIEWSTATEGENERATOR': 'get this value from the login page source',
    '__EVENTTARGET': '',#empty
    '__EVENTARGUMENT': '',#empty
    '__EVENTVALIDATION': 'get this value from the login page source',
    'ctl00$agentPlatform': '1',
    'ctl00$menu_nav1$tbxSearchWord': '',#empty
    'ctl00$ContentPlaceHolder1$radiobutton':    '0',
    'ctl00$ContentPlaceHolder1$tbxLoginID': 'abcd',
    'ctl00$ContentPlaceHolder1$tbxLoginPW': '1234',
    'ctl00$ContentPlaceHolder1$ibtnLogin.x': '36', #i think this is the mouse cursor position
    #when clicked on login, not sure if its necessary
    'ctl00$ContentPlaceHolder1$ibtnLogin.y': '25'
}

response = c.get('What should I write here?')

Write the url of the protected page! If you can get it successfully then you are logged in.

answered Sep 14, 2018 by Priyaj
• 56,120 points

Related Questions In Python

0 votes
1 answer

Can't Click an Element in Python Selenium After Successfully Finding It

I've encountered this problem of not being ...READ MORE

answered Oct 8, 2018 in Python by Priyaj
• 56,120 points
66 views
0 votes
1 answer
0 votes
1 answer

How can I prevent brute force login attacks using Django in Python?

Hi. Django-axes is an already existing application ...READ MORE

answered Feb 15 in Python by Nymeria
• 3,500 points
46 views
+3 votes
7 answers

How can I rename a file in Python?

yes, you can use "os.rename" for that. ...READ MORE

answered Mar 31, 2018 in Python by DareDev
• 6,560 points
55 views
0 votes
1 answer

How to use BeautifulSoup for Webscraping

Your code is good until you get ...READ MORE

answered Sep 6, 2018 in Python by Priyaj
• 56,120 points
122 views
0 votes
1 answer

How to download intext images with beautiful soup

Try this: html_data = """ <td colspan="3"><b>"Assemble under ...READ MORE

answered Sep 10, 2018 in Python by Priyaj
• 56,120 points
246 views
0 votes
1 answer

How to download intext images with beautiful soup

Ohh... I got what you need. Try this: html_data ...READ MORE

answered Sep 20, 2018 in Python by Priyaj
• 56,120 points
786 views
0 votes
1 answer

Get all the read more links of amazon.jobs with Python

As you've noticed your request returns only ...READ MORE

answered Sep 28, 2018 in AWS by Priyaj
• 56,120 points
40 views
0 votes
1 answer

Crawling after login in Python

You missed a few login data forms, ...READ MORE

answered Sep 7, 2018 in Python by Priyaj
• 56,120 points
129 views
0 votes
1 answer

Can't Click an Element in Python Selenium After Successfully Finding It

I've encountered this problem of not being ...READ MORE

answered Oct 4, 2018 in Python by Priyaj
• 56,120 points
609 views

© 2018 Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.
"PMP®","PMI®", "PMI-ACP®" and "PMBOK®" are registered marks of the Project Management Institute, Inc. MongoDB®, Mongo and the leaf logo are the registered trademarks of MongoDB, Inc.