How to scrape the specific text from kworb and extract it as an excel file

0 votes

I'm trying to scrape the positions, the artists and the songs from a ranking list on kworb. For example: https://kworb.net/spotify/country/us_weekly.html

I used the following script:

import requests
from bs4 import BeautifulSoup

response = requests.get("https://kworb.net/spotify/country/us_weekly.html")
content = response.content
soup = BeautifulSoup(response.content, 'html.parser')

print(soup.get_text())

And here is the output:

ITUNES
WORLDWIDE
ARTISTS
CHARTS
DON'T PRAY
RADIO
SPOTIFY
YOUTUBE
TRENDING
HOME


CountriesArtistsListenersCities




Spotify Weekly Chart - United States - 2023/02/09 | Totals


PosP+Artist and TitleWksPk(x?)StreamsStreams+Total

1
+1
SZA - Kill Bill
9
1(x5)
15,560,813
+247,052
148,792,089
2
-1
Miley Cyrus - Flowers
4
1(x3)
13,934,413
-4,506,662
75,009,251
3
+20
Morgan Wallen - Last Night
2
3(x1)
11,560,741
+6,984,649
16,136,833
...

How do I only get the positions, the artists and the songs separately and store it as an excel first?

expected output:

Pos         Artist            Songs
1           SZA               Kill Bill
2           Miley Cyrus       Flowers
3           Morgan Wallen     Last Night
...
Feb 18, 2023 in Others by Kithuzzz
• 38,010 points
370 views

1 answer to this question.

0 votes

The best practice to scrape tables is using pandas.read_html() it uses BeautifulSoup under the hood for you.

import pandas as pd

#find table by id and select first index from list of dfs
df = pd.read_html('https://kworb.net/spotify/country/us_weekly.html', attrs={'id':'spotifyweekly'})[0]

#split the column by delimiter and creat your expected columns
df[['Artist','Song']]=df['Artist and Title'].str.split(' - ', n=1, expand=True)

#pick your columns and export to excel
df[['Pos','Artist','Song']].to_excel('yourfile.xlsx', index = False)

An alternative based on direct approach:

import requests
from bs4 import BeautifulSoup
import pandas as pd

soup = BeautifulSoup(requests.get("https://kworb.net/spotify/country/hk_weekly.html").content, 'html.parser')

data = []

for e in soup.select('#spotifyweekly tr:has(td)'):
    data .append({
        'Pos':e.td.text,
        'Artist':e.a.text,
        'Song':e.a.find_next_sibling('a').text
    })
pd.DataFrame(data).to_excel('yourfile.xlsx', index = False)

Outputs

Pos Artist Song
1 SZA Kill Bill
2 Miley Cyrus Flowers
3 Morgan Wallen Last Night
4 Metro Boomin Creepin'
5 Lil Uzi Vert Just Wanna Rock
6 Drake Rich Flex
7 Metro Boomin Superhero (Heroes & Villains) [with Future & Chris Brown]
8 Sam Smith Unholy
answered Feb 18, 2023 by narikkadan
• 63,420 points

Related Questions In Others

0 votes
1 answer

How do I change the format of an excel workbook from 'General' to 'Text'

Only cells have a format for numbers. ...READ MORE

answered Mar 23, 2023 in Others by narikkadan
• 63,420 points
188 views
0 votes
1 answer

How can I convert excel to PDF by Libreoffice and keep all format from excel file?

"Times New Roman" typeface does not have ...READ MORE

answered Oct 3, 2022 in Others by narikkadan
• 63,420 points
1,269 views
0 votes
1 answer

Uipath(RPA) : read data from the PDF file and write to Excel file

If you want to use UiPath and ...READ MORE

answered Oct 17, 2022 in Others by narikkadan
• 63,420 points
1,647 views
0 votes
2 answers
+1 vote
2 answers

how can i count the items in a list?

Syntax :            list. count(value) Code: colors = ['red', 'green', ...READ MORE

answered Jul 7, 2019 in Python by Neha
• 330 points

edited Jul 8, 2019 by Kalgi 4,023 views
0 votes
1 answer
0 votes
1 answer

How to find out how many rows and columns to read from an Excel file with PHPExcel?

Solution: $file_name = htmlentities($_POST['file_name']); $sheet_name = htmlentities($_POST['sheet_name']); $number_of_columns = htmlentities($_POST['number_of_columns']); $number_of_rows ...READ MORE

answered Oct 23, 2022 in Others by narikkadan
• 63,420 points
6,647 views
0 votes
1 answer

How can I scrape a excel file from a website and divide it in different parts?

Use Scrapy or beautifulsoup4 parsing data it's more convenient ...READ MORE

answered Jan 13, 2023 in Others by narikkadan
• 63,420 points
369 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP