web scraping on Tripadvisor's review

0 votes
I tried to web scraping tripadvisor's airlines review

https://www.tripadvisor.com/Airline_Review-d8729017-Reviews-Alaska-Airlines#REVIEWS

tried to extract rating:

rating <- page01 %>% html_node(".bubble_30 span") %>% html_text()

it shows (NA)

Tried to extract page:

pageNum <- page01 %>% html_nodes(".pageNum") %>% html_text()

it shows:character(0)

Please guide me.

Thanks
Sep 14 in Data Analytics by uwmarkyo
• 120 points
205 views

2 answers to this question.

0 votes

Hey, I don't know much about web scraping but check out this blog for an exact scenario as yours.

The below blog/article has an example to extract a page from the website.

https://www.datacamp.com/community/tutorials/r-web-scraping-rvest

Hope it helps!

answered Sep 15 by anonymous
• 32,260 points

Hi, thanks.

I did check this site before I posted this thread.

The html part gets me confusion.

for example:

I would like to get the last page number on this site :https://www.tripadvisor.com/Airline_Review-d8729017-Reviews-Alaska-Airlines

<a class="pageNum " href="/Airline_Review-d8729017-Reviews-or14000-Alaska-Airlines">2801</a>

My code is 

url %>%

html_nodes(".pageNum")

but I encounter the href in the node, so I've tried many times still not able to get the page number back.

Hey Markyo,

I tried it with https://www.tripadvisor.com/Airline_Review-d8729017-Reviews-Alaska-Airlines website but couldn't find the problem.

So I tried doing web scraping with similar syntax as yours with edureka community website. It worked fine. 

Here is the code is used. 

page01 <-read_html('https://www.edureka.co/community/')
pageNum <- page01 %>% 
  html_nodes('.qa-page-link') %>% 
  html_text()                   
pageNum
htmlpage <- paste(url, '?page=', pageNum[1])
data = read_html(htmlpage)

Hope it helps!

0 votes

Hi, I used the below code to find the review.

Check it out!!

> tripadvsor = read_html('https://www.tripadvisor.com/Airline_Review-d8729017-Reviews-Alaska-Airlines#REVIEWS')
> tripadvsor %>% html_nodes('.flights-airline-review-page-overview-module-OverviewModule__review_num--2Ga7T') %>% html_text() %>% as.numeric()
[1] 4
answered Sep 16 by Cherukuri
• 32,260 points

Related Questions In Data Analytics

0 votes
1 answer

R programming Web Scraping

Try something like this: library(rvest) library(rvest) library(tidy ...READ MORE

answered Oct 29, 2018 in Data Analytics by Maverick
• 10,040 points
121 views
0 votes
1 answer

web scraping using python or R?

In simple words, Python can be a ...READ MORE

answered Nov 21, 2018 in Data Analytics by Kalgi
• 46,110 points
181 views
0 votes
1 answer

Building a Time series prediction model on web login timestamp

I had done something similar and ran ...READ MORE

answered Dec 7, 2018 in Data Analytics by Upasana
• 8,570 points
369 views
0 votes
1 answer

Check if a website permits web scraping - R

Vinutha, While doing web scraping its necessary ...READ MORE

answered Sep 17 in Data Analytics by aditya
32 views
0 votes
1 answer

Error saying "Error in x$children[[1]] : subscript out of bounds" while web scrapping

You could try the httr library: library(XML) library(httr) url <- 'http://www.sainsburys.co.uk/shop/gb/groceries/fruit-veg/all-fruit#langId=44&storeId=10151&catalogId=10122&categoryId=12545&parent_category_rn=12518&top_category=12518&pageSize=30&orderBy=FAVOURITES_FIRST&searchTerm' doc <- ...READ MORE

answered Nov 9, 2018 in Data Analytics by Maverick
• 10,040 points
299 views
0 votes
1 answer
0 votes
1 answer

How to provide xpath value for web scraping?

Hey Karthik, XPath uses path expressions to select ...READ MORE

answered Sep 18 in Data Analytics by anonymous
• 32,260 points
28 views
0 votes
1 answer

How to provide color to ggplot scatter chart depending on field value?

cyl is a continuous value field, so ...READ MORE

answered Nov 2 in Data Analytics by anonymous
• 32,260 points
25 views