I want to download a file from the website by web scraping Can anyone explain how to do this in jupyter lab python with an example

0 votes
Apr 7, 2020 in Python by António
• 120 points
643 views

1 answer to this question.

0 votes

Hey,

Web scraping is a technique to automatically access and extracts large amounts of information from a website. So let's see how to use python as our web scraping language. So for this, you need to follow the steps below: 

  1.  If you are using Windows, please install Python from the official website.
  2. We need to install all the libraries, i.e., BeautifulSoup library using pip a package management tool for Python.
  3. In the terminal, type:
easy_install pip
pip install BeautifulSoup4

       4. Before we jump into coding you should know basics oh HTML.

       5. Inspecting the page, let's take an example of this website https://www.bloomberg.com/quote/SPX:IND.

       6. First, right-click and open your browser’s inspector to inspect the web page.

      7. Once you click on inspect, the related HTML will be selected in the browser console.

      8. From the result, you will get the price is inside a few levels of HTML codes, which will be:

<div class="basic-quote">

 → <div class="price-container up">

→ <div class="price">.

      9. Similarly, if you just click the name “S&P 500 Index”, which is inside:

 <div class="basic-quote"> 

 <h1 class="name">.

    10. Now we will know the location of the data with the help of class tags.

    11. Let's jump on the code, the point we know out data location, we can start coding in web scraper. You need to open your text editor.

   12. For that, we need to import all the libraries that we are going to use:

# import libraries
import urllib2
from bs4 import BeautifulSoup

   13. Then we need to declare a variable for the URL of the page:

# specify the url
quote_page = ‘paste the url'

   14. Then we need to make use of the Python urllib2 to get the HTML page the URL declared:

# query the website and return the html to the variable ‘page’
page = urllib2.urlopen(quote_page)

    15. And finally, we can parse the page into BeautifulSoup format so we can use BeautifulSoup to work on that.

# parse the html using beautiful soup and store in variable `store`
store = BeautifulSoup(page, ‘html.parser’)

Now we have a variable, store, containing the HTML of the page. Now we can start coding the part that extracts the data.

   16. Here we can extract the content with find().  Since HTML class name is unique on this page, we can simply query:

 <div class="name">.
# Take out the <div> of name and get its value
name_box = store.find(‘h1’, attrs={‘class’: ‘name’})

   17. Once we get the tag, we can get the data by getting its text.

name = name_box.text.strip() # strip() is used to remove starting and trailing
print name

  18. Similarly, we can get the price also:

# get the index price
price_box = store.find(‘div’, attrs={‘class’:’price’})
price = price_box.text
print price

Once you run the program, you will able to see that it prints out the current price of the S&P 500 Index.

I hope this will be helpful to you. And To know more about jupyter, you can go through this https://www.edureka.co/blog/cheatsheets/jupyter-notebook-cheat-sheet

answered Apr 7, 2020 by Gitika
• 65,870 points

Related Questions In Python

0 votes
2 answers
0 votes
0 answers
0 votes
0 answers

i want to ask that how can i run one file of python in another file in jupyter notebook

motion_detection.ipynb # Python program to implement # Webcam ...READ MORE

Dec 15, 2020 in Python by Ramsha
• 120 points
92 views
+2 votes
3 answers

How can I play an audio file in the background using Python?

down voteacceptedFor windows: you could use  winsound.SND_ASYNC to play them ...READ MORE

answered Apr 3, 2018 in Python by charlie_brown
• 7,780 points
8,296 views
0 votes
1 answer

How can I find out the index of an element from row and column in Python?

You probably want to use np.ravel_multi_index: [code] import numpy ...READ MORE

answered Apr 16, 2018 in Python by charlie_brown
• 7,780 points
567 views
0 votes
2 answers

In Python, how do I read a file line-by-line into a list?

readline function help to  read line in ...READ MORE

answered Jun 21, 2020 in Python by sahil
• 540 points
448 views
0 votes
1 answer

How do I use urllib to see if a website is 404 or 200 in Python?

For Python 3, try doing this: import urllib.request, ...READ MORE

answered Nov 29, 2018 in Python by Nymeria
• 3,520 points

edited Dec 11, 2018 by Nymeria 8,078 views
0 votes
1 answer

How can I read numbers in Python from a custom file?

Hi, good question. Let us first assume that ...READ MORE

answered Feb 6, 2019 in Python by Nymeria
• 3,520 points
418 views
0 votes
1 answer

how can i extact all the links from a website using python and save it in a csv file ?

Hi, @Shubham, Web scraping is the technique to ...READ MORE

answered Jun 16, 2020 in Python by Gitika
• 65,870 points
1,594 views
0 votes
4 answers

How do I remove an element from a list by index in Python?

1886 Use del and specify the index of the element ...READ MORE

answered Dec 11, 2020 in Python by Gitika
• 65,870 points
63,101 views