I want to download a file from the website by web scraping. Can anyone explain how to do this in jupyter lab (python) with an example?

0 votes
Apr 7 in Python by António
• 120 points
332 views

1 answer to this question.

0 votes

Hey,

Web scraping is a technique to automatically access and extracts large amounts of information from a website. So let's see how to use python as our web scraping language. So for this, you need to follow the steps below: 

  1.  If you are using Windows, please install Python from the official website.
  2. We need to install all the libraries, i.e., BeautifulSoup library using pip a package management tool for Python.
  3. In the terminal, type:
easy_install pip
pip install BeautifulSoup4

       4. Before we jump into coding you should know basics oh HTML.

       5. Inspecting the page, let's take an example of this website https://www.bloomberg.com/quote/SPX:IND.

       6. First, right-click and open your browser’s inspector to inspect the web page.

      7. Once you click on inspect, the related HTML will be selected in the browser console.

      8. From the result, you will get the price is inside a few levels of HTML codes, which will be:

<div class="basic-quote">

 → <div class="price-container up">

→ <div class="price">.

      9. Similarly, if you just click the name “S&P 500 Index”, which is inside:

 <div class="basic-quote"> 

 <h1 class="name">.

    10. Now we will know the location of the data with the help of class tags.

    11. Let's jump on the code, the point we know out data location, we can start coding in web scraper. You need to open your text editor.

   12. For that, we need to import all the libraries that we are going to use:

# import libraries
import urllib2
from bs4 import BeautifulSoup

   13. Then we need to declare a variable for the URL of the page:

# specify the url
quote_page = ‘paste the url'

   14. Then we need to make use of the Python urllib2 to get the HTML page the URL declared:

# query the website and return the html to the variable ‘page’
page = urllib2.urlopen(quote_page)

    15. And finally, we can parse the page into BeautifulSoup format so we can use BeautifulSoup to work on that.

# parse the html using beautiful soup and store in variable `store`
store = BeautifulSoup(page, ‘html.parser’)

Now we have a variable, store, containing the HTML of the page. Now we can start coding the part that extracts the data.

   16. Here we can extract the content with find().  Since HTML class name is unique on this page, we can simply query:

 <div class="name">.
# Take out the <div> of name and get its value
name_box = store.find(‘h1’, attrs={‘class’: ‘name’})

   17. Once we get the tag, we can get the data by getting its text.

name = name_box.text.strip() # strip() is used to remove starting and trailing
print name

  18. Similarly, we can get the price also:

# get the index price
price_box = store.find(‘div’, attrs={‘class’:’price’})
price = price_box.text
print price

Once you run the program, you will able to see that it prints out the current price of the S&P 500 Index.

I hope this will be helpful to you. And To know more about jupyter, you can go through this https://www.edureka.co/blog/cheatsheets/jupyter-notebook-cheat-sheet

answered Apr 7 by Gitika
• 37,660 points

Related Questions In Python

0 votes
2 answers
0 votes
2 answers

How do I remove an element from a list by index in Python?

Delete the List and its element: We have ...READ MORE

answered Jun 7 in Python by sahil
• 540 points
30,849 views
0 votes
0 answers
+2 votes
3 answers

How can I play an audio file in the background using Python?

down voteacceptedFor windows: you could use  winsound.SND_ASYNC to play them ...READ MORE

answered Apr 3, 2018 in Python by charlie_brown
• 7,780 points
6,171 views
0 votes
1 answer

How can I find out the index of an element from row and column in Python?

You probably want to use np.ravel_multi_index: [code] import numpy ...READ MORE

answered Apr 16, 2018 in Python by charlie_brown
• 7,780 points
315 views
0 votes
2 answers

In Python, how do I read a file line-by-line into a list?

readline function help to  read line in ...READ MORE

answered Jun 21 in Python by sahil
• 540 points
274 views
0 votes
1 answer

How do I use urllib to see if a website is 404 or 200 in Python?

For Python 3, try doing this: import urllib.request, ...READ MORE

answered Nov 29, 2018 in Python by Nymeria
• 3,520 points

edited Dec 11, 2018 by Nymeria 5,795 views
0 votes
1 answer

How can I read numbers in Python from a custom file?

Hi, good question. Let us first assume that ...READ MORE

answered Feb 6, 2019 in Python by Nymeria
• 3,520 points
202 views
0 votes
1 answer

how can i extact all the links from a website using python and save it in a csv file ?

Hi, @Shubham, Web scraping is the technique to ...READ MORE

answered Jun 16 in Python by Gitika
• 37,660 points
230 views
0 votes
1 answer

How to create a unicode string in python with the string eg: This is a string?

Hey, @Roshni, It is very simple to execute, ...READ MORE

answered Jun 23 in Python by Gitika
• 37,660 points
62 views