I want to download a file from the website by web scraping. Can anyone explain how to do this in jupyter lab (python) with an example?

0 votes
Apr 7 in Python by António
• 120 points

1 answer to this question.

0 votes


Web scraping is a technique to automatically access and extracts large amounts of information from a website. So let's see how to use python as our web scraping language. So for this, you need to follow the steps below: 

  1.  If you are using Windows, please install Python from the official website.
  2. We need to install all the libraries, i.e., BeautifulSoup library using pip a package management tool for Python.
  3. In the terminal, type:
easy_install pip
pip install BeautifulSoup4

       4. Before we jump into coding you should know basics oh HTML.

       5. Inspecting the page, let's take an example of this website https://www.bloomberg.com/quote/SPX:IND.

       6. First, right-click and open your browser’s inspector to inspect the web page.

      7. Once you click on inspect, the related HTML will be selected in the browser console.

      8. From the result, you will get the price is inside a few levels of HTML codes, which will be:

<div class="basic-quote">

 → <div class="price-container up">

→ <div class="price">.

      9. Similarly, if you just click the name “S&P 500 Index”, which is inside:

 <div class="basic-quote"> 

 <h1 class="name">.

    10. Now we will know the location of the data with the help of class tags.

    11. Let's jump on the code, the point we know out data location, we can start coding in web scraper. You need to open your text editor.

   12. For that, we need to import all the libraries that we are going to use:

# import libraries
import urllib2
from bs4 import BeautifulSoup

   13. Then we need to declare a variable for the URL of the page:

# specify the url
quote_page = ‘paste the url'

   14. Then we need to make use of the Python urllib2 to get the HTML page the URL declared:

# query the website and return the html to the variable ‘page’
page = urllib2.urlopen(quote_page)

    15. And finally, we can parse the page into BeautifulSoup format so we can use BeautifulSoup to work on that.

# parse the html using beautiful soup and store in variable `store`
store = BeautifulSoup(page, ‘html.parser’)

Now we have a variable, store, containing the HTML of the page. Now we can start coding the part that extracts the data.

   16. Here we can extract the content with find().  Since HTML class name is unique on this page, we can simply query:

 <div class="name">.
# Take out the <div> of name and get its value
name_box = store.find(‘h1’, attrs={‘class’: ‘name’})

   17. Once we get the tag, we can get the data by getting its text.

name = name_box.text.strip() # strip() is used to remove starting and trailing
print name

  18. Similarly, we can get the price also:

# get the index price
price_box = store.find(‘div’, attrs={‘class’:’price’})
price = price_box.text
print price

Once you run the program, you will able to see that it prints out the current price of the S&P 500 Index.

I hope this will be helpful to you. And To know more about jupyter, you can go through this https://www.edureka.co/blog/cheatsheets/jupyter-notebook-cheat-sheet

answered Apr 7 by Gitika
• 29,170 points

Related Questions In Python

0 votes
1 answer
0 votes
1 answer

How do I remove an element from a list by index in Python?

You can use the pop() method to ...READ MORE

answered Jun 21, 2019 in Python by Nisa
• 1,090 points
0 votes
0 answers
+2 votes
3 answers

How can I play an audio file in the background using Python?

down voteacceptedFor windows: you could use  winsound.SND_ASYNC to play them ...READ MORE

answered Apr 3, 2018 in Python by charlie_brown
• 7,760 points
0 votes
1 answer

How can I find out the index of an element from row and column in Python?

You probably want to use np.ravel_multi_index: [code] import numpy ...READ MORE

answered Apr 16, 2018 in Python by charlie_brown
• 7,760 points
0 votes
1 answer

In Python, how do I read a file line-by-line into a list?

with open(fname) as f:     content = f.readlines() # you ...READ MORE

answered Oct 9, 2018 in Python by SDeb
• 13,250 points
0 votes
1 answer

How do I use urllib to see if a website is 404 or 200 in Python?

For Python 3, try doing this: import urllib.request, ...READ MORE

answered Nov 29, 2018 in Python by Nymeria
• 3,540 points

edited Dec 11, 2018 by Nymeria 3,959 views
0 votes
1 answer

How can I read numbers in Python from a custom file?

Hi, good question. Let us first assume that ...READ MORE

answered Feb 6, 2019 in Python by Nymeria
• 3,540 points
0 votes
1 answer

How can we remove an element from dic type variables. Eg : a={'A' : 2 , 'K' : 4 , 'D' : 6} }

Hey, @Kanishka, Suppose your dict contains: a={'A' : 2 ...READ MORE

answered May 20 in Python by Gitika
• 29,170 points