Reviews for businesses and restaurants on Yelp are typically used by consumers to determine where to best spend their money. However, reviews also offer important insights for businesses to leverage. Reviews can show what a business is doing right and what it needs to improve upon. Naive approaches to discovering insights from reviews require an impractical amount of text to iterate through, often skewed by text outliers that would distract a business from implementing meaningful changes. With hopes of circumventing these issues, this project aims to use Latent Dirichlet Allocation (LDA) to pinpoint areas of improvement relevant to a business. Along with topic extraction, we incorporate sentiment analysis to generate recommendations for cafes.
Yelp currently has “tips” from users to help other users. However, there is useful information that restaurants can obtain from their customers’ reviews. More specifically, features or subtopics can be extracted from reviews to discover the distinct topics most relevant to the restaurant (i.e. what its customers are saying about it).
The reviews of a business on Yelp can yield key ideas for business improvements. The discovery of these core concepts can help a business improve its ratings. Without the ability to acquire actionable insights from review text, however, businesses are left guessing about their flaws. The goal of this project’s analysis is to provide business owners a clear target for the pursuit of improving their ratings.
The dataset that was utilized to address this problem was the Yelp Academic Dataset. This dataset includes 4.1 million reviews by users for businesses and 1.1 million business attributes. For this project, the businesses were filtered by the business category “Coffee & Tea”, or cafes. Businesses were narrowed down to cafes so that we could find meaningful subtleties between a particular category of restaurants, rather than extract generic topics across all restaurants (e.g. vague topics like “lunch” or “dinner”). The initial hypothesis was that inspecting the cafe category would reveal influential aspects specific to cafe businesses such as barista wait time, specialty drinks, and bakery items.
The model devised to help businesses to improve their service incorporates two main components:
Latent Dirichlet Allocation (LDA): The trained LDA model can then be used to determine the subtopics contained in input reviews. LDA is used to lower dimensionality of the large text data of the reviews and extract their latent subtopics.
Sentiment Treebank: The next step involves analyzing the reviews sentiment in order to assess how the individuals feel about the topics extracted from their reviews. The ranking of the topics based on the sentiment of the reviews can then be used to suggest improvements that businesses can make.
Improving Restaurants by Extracting Subtopics from Yelp Reviews highlights the applications of using an LDA model to extract subtopics from text data. This paper focused mainly on the correlation of individual topics with the overall review rating, and how the rating of topics correlated with each other. Our implementation strives to improve upon their model by identifying topics which a business can improve upon. Furthermore, we focus on a specific category which allows our model to better identify subtle, yet important topics.
Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank relates to the building of the Sentiment Treebank in the Stanford NLP package. This model of sentiment analysis strives to give a better understanding of sentiment in longer phrases compared to traditional sentiment scores. The sentiment scores from the trees will be used to give recommendations on the review subtopics.
Latent Dirichlet Allocation was chosen as the preferred method for extracting subtopics, instead of primary topics from a text corpus. LDA is an unsupervised learning algorithm so it requires later interpretation. Training data was collected from the Yelp Academic Dataset, specifically the reviews under the category of “Coffee & Tea”. The reviews were stemmed and transformed into the correct format and then fed into the GenSim Python library’s LDA function to create our topic model. After review of multiple iterations of the model, 25 was determined to be the correct amount of topic categories. The model could then be used to predict the subtopics contained in a new review and to characterize the topic distribution of a cafe.
|Coffee - General||Atmosphere||Wait Time / Service||Baked Bread Items|
|coffee (20.4%)||place (8.9%)||time (4.5%)||787 (8.6%)|
|shop (3.9%)||staff (5.2%)||service (3.1%)||pastry (6.4%)|
|bean (1.3%)||music (2.1%)||order (3.4%)||bread (2.3%)|
|espresso (%)||fun (1.3%)||wait (1.3%)||baguette (1.6%)|
A list of around 10 words was returned that defined each subtopic of the LDA model. This output was then interpreted in order to come up with general topic labels. For example the label ’Seating’ could have been interpreted from the words ’chair’, ’table’, and ’space’. Given an input review the subtopics returned also included weights related to the percentage that they contributed to the overall categorization. Using these weights it was possible to determine which topics were of the most importance to a given cafe, allowing our model to filter out topics which were of little value to the business.
“Went in here for the first time today to just grab a cup of coffee and do some studying. Great atmosphere. Not a lot of tables to sit down and people tend to stay there for a long time but that’s typical of most coffee shops especially when their coffee is as good as theirs. I actually just got their regular drip coffee and it was some of the best standard coffee around..”
The Stanford NLP Sentiment Treebank was used on reviews as their subtopics were being extracted. This allowed our model to assign a Sentiment Score to topics. This Sentiment Score could then be used to rank the topics, giving insights into which topics were viewed favorably and which were viewed more negatively. Star Rating and Sentiment Score were combined to determine the recommendations to give a business.
The Sentiment Treebank could also be used to predict the star rating of a review using the formula generated during linear regression. The sentiment score correlated very strongly with the star rating that users gave on their reviews. This high correlation suggests that the Sentiment Treebank worked very well in assigning its score.
A major issue with generating the LDA model was finding substantial training data related to the focus of study. The plan at the outset of this project was to scrape Yelp and possibly other websites for reviews on the chosen category. While scraping Yelp, my computers were flagged as bots after approximately 20 pages of data were collected. I decided the best publicly available dataset was the Yelp Academic Dataset and used its reviews to train a model after filtering for coffee shops.
It was also difficult to figure out an appropriate number of topics to use for LDA. Initially 50 topics was tested but it turned out that was too many to produce well-defined topic groups. However, 25 worked much better and produced clear topics in the testing data.
The subtopic groups returned by the LDA model are very subjective and often difficult to characterize. Because these topics were entirely based on interpretation, it was not possible to completely verify the labels assigned to subtopics.
Another issue faced was that the LDA model also has the propensity to return infrequent subtopics. To improvize, weight of the subtopic and its frequency of occurence were factored in. The least frequently produced topics were not considered as result options.
I was unsure about how to prioritize ranking topics by stars as opposed to ranking by sentiment score. In the future, it would be interesting to see which plays a bigger role.
Although a few of the topic labels were warily assigned, there were many that showed clear topics: Atmosphere, Service, Doughnuts, Seating, Bakery Items, Coffee, Tea and Specialty Drinks. These topics offer very clear items which a business can improve or be proud of. Based on individual inspection of the reviews and the returned topics, the model seems to work well for identifying subtopics.
The relationship between star rating and sentiment of review shows a high correlation over many different cafe reviews and the combination of both appears to be a good indicator for the overall feeling of a topic by the reviewers.
Beyond Cafes: Creating multiple LDA models across different food and restaurant categories could be an improvement over this single LDA model. Multiple models across businesses could also lend insights into trends not obvious looking only at cafes.
Larger Training Set: Training on a larger dataset would improve the LDA model and prevent overfitting. Because a large portion of the businesses in the Yelp Academic Dataset were in the Las Vegas area, a subtopic emerged in our model for Hotels and even included the word ’Vegas’. This is obviously a consequence of our training data and could be avoided with a larger, more representative set of data.
Topic modeling using LDA combined with the more nuanced Sentiment Analysis approach of a Sentiment Treebank is a step in the right direction to help businesses realize their strengths and improve on their deficiencies. Although there are some drawbacks to this model in terms of breadth, it still allows for a cafes to gain simple yet valuable insights from its review data. Lastly, it would be interesting if Yelp adopted a similar idea and added tools to help businesses understand how to become better through their consumers’ reviews.
The code for this project can be found here.