I'm a Beginner in Python, I just want to scrap all the read more links from amazon job page. for example, I want to scrap this page

Below is the code I used.

#import the library used to query a website
import urllib2
#import the Beautiful soup functions to parse the data returned from the website
from bs4 import BeautifulSoup

#specify the url
url = ""

#Query the website and return the html to the variable 'page'
page = urllib2.urlopen(url)

#Parse the html in the 'page' variable, and store it in Beautiful Soup format
soup = BeautifulSoup(page, "lxml")
print soup.find_all("a")


[<a class="icon home" href="/en">Home</a>,
 <a class="icon check-status" data-target="#icims-portal-selector" data-toggle="modal">Review application status</a>,
 <a class="icon working" href="/en/working/working-amazon">Amazon culture &amp; benefits</a>,
 <a class="icon locations" href="/en/locations">Locations</a>,
 <a class="icon teams" href="/en/business_categories">Teams</a>,
 <a class="icon job-categories" href="/en/job_categories">Job categories</a>,
 <a class="icon help" href="/en/faqs">Help</a>,
 <a class="icon language" data-animate="false" data-target="#locale-options" data-toggle="collapse" href="#locale-options" id="current-locale">English</a>,
 <a href="/en/privacy/us">Privacy and Data</a>,
 <a href="/en/impressum">Impressum</a>]

I am getting links to only static elements in the page i.e which are constant for any query but I need the links to 4896 jobs. Can anyone guide me where I am doing wrong?

As you've noticed your request returns only static elements, because the job links are generated by js. In order to get js generated content you'd need selenium or similar clients that run js.
However, if you inspect the HTTP traffic, you'll notice that the jobs data are loaded by XHR request to api: /search.json, which returns json data.

So, using urllib2 and json we can get the total number of results and collect all the data,

import urllib2
import json

api_url = '[]=location&facets[]=business_category&facets[]=category&facets[]=schedule_type_id&facets[]=employee_class&facets[]=normalized_location&facets[]=job_function_id&offset=0&result_limit={results}&sort=relevant&loc_group_id=seattle-metro&latitude=&longitude=&loc_group_id=seattle-metro&loc_query={location}&base_query={query}&city=&country=&region=&county=&query_options=&'
query = ''
location = 'Greater Seattle Area, WA, United States'
request = urllib2.urlopen(api_url.format(query=query, location=location, results=10))
results = json.loads(['hits']

request = urllib2.urlopen(api_url.format(query=query, location=location, results=results))
jobs = json.loads(['jobs']
for i in jobs:
    i['job_path'] = '' + i['job_path']

The jobs list holds a number of dictionaries with all the job information (title, state, city, etc). If you want to select a specific item - for example the links - just loop over the list and select that item.

links = [i['job_path'] for i in jobs]
print links
answered Sep 28, 2018 by Priyaj
• 58,090 points

