web scraping on Tripadvisor s review

I tried to web scraping tripadvisor's airlines review


tried to extract rating:

rating <- page01 %>% html_node(".bubble_30 span") %>% html_text()

it shows (NA)

Tried to extract page:

pageNum <- page01 %>% html_nodes(".pageNum") %>% html_text()

it shows:character(0)

Please guide me.

Sep 14, 2019 in Data Analytics by uwmarkyo
Hey, I don't know much about web scraping but check out this blog for an exact scenario as yours.

The below blog/article has an example to extract a page from the website.


Hope it helps!

answered Sep 15, 2019 by anonymous
Hi, thanks.

I did check this site before I posted this thread.

The html part gets me confusion.

for example:

I would like to get the last page number on this site :https://www.tripadvisor.com/Airline_Review-d8729017-Reviews-Alaska-Airlines

<a class="pageNum " href="/Airline_Review-d8729017-Reviews-or14000-Alaska-Airlines">2801</a>

My code is 

url %>%


but I encounter the href in the node, so I've tried many times still not able to get the page number back.

Hey Markyo,

I tried it with https://www.tripadvisor.com/Airline_Review-d8729017-Reviews-Alaska-Airlines website but couldn't find the problem.

So I tried doing web scraping with similar syntax as yours with edureka community website. It worked fine. 

Here is the code is used. 

page01 <-read_html('https://www.edureka.co/community/')
pageNum <- page01 %>% 
  html_nodes('.qa-page-link') %>% 
htmlpage <- paste(url, '?page=', pageNum[1])
data = read_html(htmlpage)

Hope it helps!

Hi, I used the below code to find the review.

Check it out!!

> tripadvsor = read_html('https://www.tripadvisor.com/Airline_Review-d8729017-Reviews-Alaska-Airlines#REVIEWS')
> tripadvsor %>% html_nodes('.flights-airline-review-page-overview-module-OverviewModule__review_num--2Ga7T') %>% html_text() %>% as.numeric()
[1] 4
answered Sep 16, 2019 by Cherukuri
