Coffee Shop Review Classification and Sentiment Analysis With ChatGPT 4.1-mini

Introduction

I have spent countless hours in coffee shops, whether it was studying, chatting, or working, they have played a substantial role in my recreation and productivity. Over the years I have found a few reliable favorites with good desks, nice vibes, and decent coffee - but how to find more?! A cursory search on Google for ‘coffee shops’ in Houston returns dozens of results, many with thousands of reviews, suggesting a thriving culture of coffee consumers.

The problem is: while those results are plentiful, they don’t provide much context beyond location, average rating, and average price. Sure, you can comb through reviews and search for keywords, but it still requires manual interpretation of both subject and sentiment. Even then, there’s no easy way to compare reviews with similar themes across multiple establishments.

The challenge: Can ChatGPT process Google reviews to create searchable context and sentiment scores for coffee shops?

A Little on ChatGPT

ChatGPT is a Generative Pre-trained Transformer large language model (LLM) designed to excel at ‘understanding’ and replicating human like speech, hence the name! A core feature of these transformers is that they are self-weighting, meaning that certain words are assigned more importance based on their perceived contribution to context. These transformers constitute the foundation upon which the LLM is built. Think of the LLM as providing the vast knowledge and the transformers the wisdom to use said knowledge. 

Unlike traditional NLP methods that often require hyperparameter tuning and validation sets, ChatGPT delivers highly capable sentiment analysis right out of the box. Allowing for simple zero-shot prompting, meaning no examples of topic classification and sentiment scoring are given.

The Data

The review data was scrapped from google reviews using outscraper. In total there are 14,620 reviews for 168 Houston coffee shops. The maximum reviews per venue was limited to 100 and only the most recent reviews were scraped.

Review Distribution:

  • 14.3% of coffee shops have fewer than 25 reviews.
  • 22.0% of coffee shops have fewer than 50 reviews.
  • 29.8% of coffee shops have fewer than 75 reviews.
  • 34.5% of coffee shops have fewer than 100 reviews.

 completion = await client.chat.completions.create(
        model=model,  
        temperature=0,
        messages=[
            {"role": "system", "content": """You are an AI that performs sentiment analysis on Google reviews. 
            Your output should be formatted as a JSON object in this exact format:
            {
                "index": <same as input>,
                "review_id": <same as input>,
                "topics": {
                    "topic1": sentiment_score [0:1],
                    "topic2": sentiment_score [0:1]
                }
             }
            Ensure that 'review_id' exactly matches the provided input review ID, and that 'index' matches the input index."""},
            {"role": "user", "content": f"""Analyze this review:
            Review Index: {index}
            Review ID: {review_id}
            Review Text: '{review_text}'

            - Sentiment scores range from Negative (0) to Positive (1).
            - Use **only** the most relevant topics from this list: {topics}.
            - Ensure that 'review_id' in the response **exactly** matches {review_id}.
            - Ensure that 'index' in the response **exactly** matches {index}.
            - Return only a **valid JSON object**, nothing else."""}],
            response_format={ "type": "json_object" })

The Fun Stuff: Prompt and Model Building

While ChatGPT can independently assign context labels to reviews, I chose to limit the available options to reduce variation and support more consistent clustering and mapping projects later on. I included labels that were important to me and I also asked a few friends what they value in a coffee shop.

The resulting list:

["Pricing", "Espresso", "Beans", "Coffee", "Iced Coffee", "Latte", "Cappuccino", "Iced Latte", "Work & Study", "Parking", "Noise", "Food", "Seating", "Service", "Chai Latte", "Iced Chai Latte", "Discounts", "Pastries", "Teas", "Ambience", "Menu & Variety", "Friendliness", "Pumpkin Spice", "Juice", "Dog Friendly", "WIFI", "Patio", "Employee Treatment", “Inclusivity & Accessibility”]

Prompt Design:

This prompt design utilizes a “system role”, essentially this acts a style constraint. Here I want to help ensure that ChatGPT stays focused on the specified task of sentiment analysis. I also add a second line showing the AI that its output schema should be structured in a specific way. To help maintain clarity f-strings are used to pass variables with minimal disruption to prompt context.

I then give the AI a specific set of constraints regarding the range for sentiment scores, instructions to only use topics from the list above, and to be sure to match the review_id and index for the review.

Finally, I like to end prompts with reiterations regarding formatting. Even with the parameter response_format={ "type": "json_object" } the model will occasionally return a malformed JSON object. Reminding the model seems to help reduce these errors.

Lastly, for consistency and hallucination control the temperature of the AI is set to zero.

Running the Model

I decided not to push the reviews in large sets and instead used async to make asynchronous API calls, each with one review. It’s not the most efficient method but I wanted to ensure that each review is processed independently without any contamination or inferred patterns from other reviews. Furthermore, the additional token cost with GPT 4.1-mini is negligible.

Test Examples

I chose a diverse set of reviews as showcase examples. The first review requires a substantial amount of contextual consideration.

review_text: "The staff there are wonderful and always incredibly accommodating. This is one of my go to spots, but I have to question the leadership there. They recently made a change in their kiosk options so that the tipping menu lists 5, 10, or 15% as opposed to the 15, ,20, 25%. I know this seems like a small slight, but worker rights are an important issue close to my heart, and to me this feels like diet wage theft. While this change might be so that we feel it is more affordable, this is at the cost of the baristas. The lack of ethics leaves a sour taste that the great coffee can't wash out. This issue is also present at both locations. This change happening during the holidays season is also not lost on me.\n\n Edit: a month later and this still has not been changed"

"index": 1,
        "review_id": "ChdDSUhNMG9nS0VJQ0FnSURmNVlhUGpnRRAB",
        "topics": {
            "Service": 0.9,
            "Employee Treatment": 0.2,
            "Pricing": 0.3

review_text: "I'm so glad they opened a location here. Great place to have a coffee and get some work done. There are lots of plugs and seating for a space this size. Ordering can be done by the counter or via app.

""index": 4,
        "review_id": "ChZDSUhNMG9nS0VJQ0FnTURBaHRyLVJnEAE",
        "topics": {
            "Coffee": 1,
            "Work & Study": 1,
            "Seating": 1,
            "WIFI": 1

review_text: 'The large iced latte has 3 shots of espresso and it tasted mostly like milk. But it’s a nice place'

Interpretation of Results

The First example does well discerning the described poor treatment of employees while also accounting for the good service. Pricing is more nuanced and the AI effectively captures the dissatisfaction of the reviewer. However, from the perspective of a prospective customer the lower tip percentage may be a positive consideration! The second and third examples are far more ordinary and the AI manages to effectively capture the topics and apply the associated sentiments. Interestingly, the results between GPT 4.1 , GPT 4o and GPT 4.1-mini have thus far been extremely consistent.

"index": 9,
        "review_id": "ChZDSUhNMG9nS0VJQ0FnTUNBcE9lNEZREAE",
        "topics": {
            "Iced Latte": 0.5,
            "Ambience": 0.7

Conclusion

Addressing the challenge: Can ChatGPT process Google reviews to create searchable context and sentiment scores for coffee shops? The answer seems to be a resounding yes! GPT 4.1-mini is able to process reviews, discern applicable topics from the predefined list, assign those topics a sentiment score based on the context of the review, and finally create a structured and searchable output.

Next Steps

The next step is to consolidate the data and create a searchable map with Tableau Public. To explore the resulting map and data in more detail see this post: Sentiment Analysis Results and Interactive Search