I want to download a file from the website by web scraping Can anyone explain how to do this in jupyter lab python with an example

1 answer to this question.

Hey,

Web scraping is a technique to automatically access and extracts large amounts of information from a website. So let's see how to use python as our web scraping language. So for this, you need to follow the steps below:

If you are using Windows, please install Python from the official website.
We need to install all the libraries, i.e., BeautifulSoup library using pip a package management tool for Python.
In the terminal, type:

easy_install pip
pip install BeautifulSoup4

4. Before we jump into coding you should know basics oh HTML.

5. Inspecting the page, let's take an example of this website https://www.bloomberg.com/quote/SPX:IND.

6. First, right-click and open your browser’s inspector to inspect the web page.

7. Once you click on inspect, the related HTML will be selected in the browser console.

8. From the result, you will get the price is inside a few levels of HTML codes, which will be:

<div class="basic-quote">

 → <div class="price-container up">

→ <div class="price">.

9. Similarly, if you just click the name “S&P 500 Index”, which is inside:

 <div class="basic-quote"> 

 <h1 class="name">.

10. Now we will know the location of the data with the help of class tags.

11. Let's jump on the code, the point we know out data location, we can start coding in web scraper. You need to open your text editor.

12. For that, we need to import all the libraries that we are going to use:

# import libraries
import urllib2
from bs4 import BeautifulSoup

13. Then we need to declare a variable for the URL of the page:

# specify the url
quote_page = ‘paste the url'

14. Then we need to make use of the Python urllib2 to get the HTML page the URL declared:

# query the website and return the html to the variable ‘page’
page = urllib2.urlopen(quote_page)

15. And finally, we can parse the page into BeautifulSoup format so we can use BeautifulSoup to work on that.

# parse the html using beautiful soup and store in variable `store`
store = BeautifulSoup(page, ‘html.parser’)

Now we have a variable, store, containing the HTML of the page. Now we can start coding the part that extracts the data.

16. Here we can extract the content with find(). Since HTML class name is unique on this page, we can simply query:

 <div class="name">.

# Take out the <div> of name and get its value
name_box = store.find(‘h1’, attrs={‘class’: ‘name’})

17. Once we get the tag, we can get the data by getting its text.

name = name_box.text.strip() # strip() is used to remove starting and trailing
print name

18. Similarly, we can get the price also:

# get the index price
price_box = store.find(‘div’, attrs={‘class’:’price’})
price = price_box.text
print price

Once you run the program, you will able to see that it prints out the current price of the S&P 500 Index.

I hope this will be helpful to you. And To know more about jupyter, you can go through this https://www.edureka.co/blog/cheatsheets/jupyter-notebook-cheat-sheet

I want to download a file from the website by web scraping Can anyone explain how to do this in jupyter lab python with an example

Your comment on this question:

1 answer to this question.

Your answer

Your comment on this answer:

Related Questions In Python

In Python, How do I read 2 CSV files, compare column 1 from both, and then write to a new file where the Column 1s match?

I have a file of type text and fields separated with 'tab' so how to get avroschema from input text file in python

i want to ask that how can i run one file of python in another file in jupyter notebook

How can I play an audio file in the background using Python?

How can I find out the index of an element from row and column in Python?

In Python, how do I read a file line-by-line into a list?

How do I use urllib to see if a website is 404 or 200 in Python?

How can I read numbers in Python from a custom file?

how can i extact all the links from a website using python and save it in a csv file ?

How do I remove an element from a list by index in Python?

Subscribe to our Newsletter, and get personalized recommendations.

TRENDING CERTIFICATION COURSES

TRENDING MASTERS COURSES

COMPANY

WORK WITH US

DOWNLOAD APP

CATEGORIES

CATEGORIES

TRENDING BLOG ARTICLES

TRENDING BLOG ARTICLES