How To Build A Social Media Sentiment Analysis Pipeline

Sentiment analysis on social media can be very useful to monitor your brand, your competitors, or any other topic of interest. In this article we show you how to build a system that listens to social media like Reddit, Hacker News, Linkedin, Twitter, etc., and automatically perform sentiment analysis on the content thanks to generative AI.

Combining Social Listening With Sentiment Analysis For Brand Sentiment Analysis

Social listening is the act of paying attention to and interpreting conversations around a any sort of topics on social media platforms, review sites, and other online channels.

Sentiment analysis, on the other hand, is the process of identifying and categorizing opinions expressed in a piece of text as positive, negative, or neutral. It involves using natural language processing, text analysis, and computational linguistics to systematically identify, extract, quantify, and study affective states and subjective information.

When you combine social listening and sentiment analysis, you can track and analyze the sentiment expressed in conversations related to your brand or your competitors. This is also known as "brand sentiment analysis". Brand sentiment analysis allows you to automatically understand how consumers feel about your brand or your competitors, identify areas for improvement, jump into the right conversation on social media to engage with potential customers, and make data-driven decisions to enhance your brand's reputation and customer loyalty.

Building a Social Listening Platform

Creating a social listening platform requires that you plug into a social media platform and retrieve every new posts and comments that contain the keywords you want to monitor.

This is more easily achieved if the platform you are planning to monitor exposes an API. For example, Reddit exposes an API that you can easily consume. Here is a simple cURL request that retrieves the last 100 Reddit posts:

curl https://www.reddit.com/r/all/new/.json?limit=100

And here is a typical response returned by their API:

{
    "kind": "Listing",
    "data": {
        "after": "t3_1asad4n",
        "dist": 100,
        "modhash": "ne8fi0fr55b56b8a75f8075df95fa2f03951cb5812b0f9660d",
        "geo_filter": "",
        "children": [
            {
                "kind": "t3",
                "data": {
                    "approved_at_utc": null,
                    "subreddit": "GunAccessoriesForSale",
                    "selftext": "Morning gents. I\u2019m looking to snag up your forgotten factory yellow spring for the 509T. I need to source one for a buddy who lost his and I cannot find any available anywhere! \n\nIf one of you have the yellow spring laying around, looking to pay $50 shipped\u2026 \n\nTo my 509t owners, it\u2019s the \u201clight\u201d spring that comes in a plastic bag in the carrying case. \n\nThanks in advance  ",
                    "author_fullname": "t2_2ezh71n6",
                    "saved": false,
                    "mod_reason_title": null,
                    "gilded": 0,
                    "clicked": false,
                    "title": "[WTB] 509T yellow spring",
                    "link_flair_richtext": [],
                    "subreddit_name_prefixed": "r/GunAccessoriesForSale",
                    [...]
                    "contest_mode": false,
                    "mod_reports": [],
                    "author_patreon_flair": false,
                    "author_flair_text_color": "dark",
                    "permalink": "/r/GunAccessoriesForSale/comments/1asadbj/wtb_509t_yellow_spring/",
                    "parent_whitelist_status": null,
                    "stickied": false,
                    "url": "https://www.reddit.com/r/GunAccessoriesForSale/comments/1asadbj/wtb_509t_yellow_spring/",
                    "subreddit_subscribers": 182613,
                    "created_utc": 1708094934.0,
                    "num_crossposts": 0,
                    "media": null,
                    "is_video": false
                    }
                },
            [...]
            ]
        }
    }

We made a dedicated tutorial showing how to monitor Reddit with a simple Go program. Read more here about how to monitor Reddit with Go.

Each social media platform has its own subtleties that we can't cover in this article unfortunately. In order to easily monitor social media platforms (like Reddit, Linkedin, X, Hacker News, and more), you might want to subscribe to a dedicated social listening platform like our KWatch.io service. Try KWatch.io for free here.

Add Keywords in Your KWatch.io Dashboard

Some of the main challenges, when performing social media listening, are the high volume of data that you have to handle, the fact that you can be blocked by the social media platform if you make too many requests, and the fact that you have to be smart about the way you handle the data.

In the next section, we will explain how to integrate the collected data into your system.

Integrating Social Media Data Into Your System

Once you have collected the data from social media platforms, you need to store it in a database or a data warehouse. This will allow you to analyze the data, perform sentiment analysis, and generate insights.

There are several ways to store social media data (which is basically pure text data), depending on your requirements and the volume of data you are dealing with. Some common options include:

• Using a relational database like MySQL or PostgreSQL
• Using a NoSQL database like MongoDB or Cassandra
• Using a data warehouse like Amazon Redshift or Google BigQuery

If you have subscribed to a social listening platform, you should check if they offer a way to transfer the data into your system.

Webhooks, often referred to as 'web callbacks' or 'HTTP push API,' serve as a means for applications to share real-time data with other applications. This is achieved by generating HTTP POST requests when specific events transpire, thus delivering information to other applications promptly.

For example on our platform, KWatch.io, you should go to the "notifications" section and set a webhook URL pointing to your system.

API Webhook on KWatch.io

Here is what the KWatch.io webhook looks like (it is a JSON payload):

{
    "platform": "reddit",
    "query": "Keywords: vllm",
    "datetime": "19 Jan 24 05:52 UTC",
    "link": "https://www.reddit.com/r/LocalLLaMA/comments/19934kd/sglang_new/kijvtk5/",
    "content": "sglang runtime has a different architecture on the higher-level part with vllm.",
}

If you're new to this, you can effortlessly receive these webhooks in Python using FastAPI.

Install FastAPI with the Uvicorn server:

pip install fastapi uvicorn

Now create a new Python file and paste the following code (you might need to adapt this script):

# Import necessary modules
from fastapi import FastAPI
from pydantic import BaseModel

# Initialize your FastAPI app
app = FastAPI()

# Update the Pydantic model to properly type-check and validate the incoming data
class WebhookData(BaseModel):
    platform: str
    query: str
    datetime: str
    link: str
    content: str

# Define an endpoint to receive webhook data
@app.post("/kwatch-webhooks")
async def receive_webhook(webhook_data: WebhookData):
    # Process the incoming data
    # For demonstration, we're just printing it
    print("Received webhook data:", webhook_data.dict())
    
    # Return a response
    return {"message": "Webhook data received successfully"}

if __name__ == "__main__":
    # Run the server with Uvicorn
    import uvicorn
    uvicorn.run(app, host="0.0.0.0", port=8000)

Save the file and run the server with the following command:

uvicorn webhook_server:app — reload — host 0.0.0.0 — port 8000

Your server is now running and ready to receive webhooks from KWatch.io.

Performing Sentiment Analysis on The Data With Generative AI Models Like GPT-4 or LLaMA 3

Once you have collected and stored the social media data, you can perform sentiment analysis on it.

Today the most accurate way to perform sentiment analysis on a piece of text about a specific keyword is by using generative AI models like GPT-4, LLaMA 3, ChatDolphin, etc. These LLMs are not necessarily fast and can be costly at scale, but they guarantee state of the art results. If you need to analyze very high volumes of keywords, you might want to lower the costs by using smaller models, or fine-tune your own model.

You could deploy your own AI model, or plug into an AI API like OpenAI or NLP Cloud. In this article we will plug into the NLP Cloud AI API.

{%tr You can register on NLP Cloud and retrieve your API key here.

Your request does not have to be too complex. For example here is a comment on Reddit, about OpenAI:

A Comment on Reddit About OpenAI

Let's use the ChatDolphin model on NLP Cloud in order to analyze the sentiment about OpenAI in this Reddit comment. First, install the NLP Cloud Python client:

pip install nlpcloud

Now you can analyze the sentiment of the Reddit comment with the following Python code:

import nlpcloud

brand = "OpenAI"
reddit_comment = "Wasn't it the same with all OpenAI products? Amazing and groundbreaking at first, soon ruined by excessive censorship and outpaced by the competitors"

client = nlpcloud.Client("chatdolphin", "your api token", gpu=True)
print(client.generation(f"What is the sentiment about {brand} in the following comment? Positive, negative, or neutral? Answer with 1 word only.\n\n{reddit_comment}"))

The response will be:

Negative

Now let's wrap up and write the final code that listens to the API webhook and performs sentiment analysis on the data:

from fastapi import FastAPI
from pydantic import BaseModel
import nlpcloud

client = nlpcloud.Client("dolphin", "your api token", gpu=True)

app = FastAPI()

class WebhookData(BaseModel):
    platform: str
    query: str
    datetime: str
    link: str
    content: str

@app.post("/kwatch-webhooks")
async def receive_webhook(webhook_data: WebhookData):
    brand = "OpenAI"
    print(client.generation(f"""What is the sentiment about {brand} in the following comment? Positive, negative, or neutral? Answer with 1 word only.\n\n
            {webhook_data.content}"""))
    
    return {"message": "Webhook data received successfully"}

if __name__ == "__main__":
    import uvicorn
    uvicorn.run(app, host="0.0.0.0", port=8000)

Conclusion

As you can see it's possible to automate sentiment analysis on social media data with the help of modern generative AI models and efficient social listening tools. This approach can be applied in various social media monitoring scenarios. Here are some ideas:

• Tracking your brand's reputation
• Tracking a competitor's reputation
• Keeping an eye on sentiment surrounding a stock option
• Monitoring sentiment related to a specific technological trend, such as AI or crypto
• ...

Productionizing such a program can be challenging though. First because social media are not so easy to monitor, but also because generative AI models can be costly to use on large volumes of data.

If you do not want to build and maintain such a system by yourself, we recommend that you use our KWatch.io platform instead, as we automatically monitor social media and perform sentiment analysis on the detected posts and comments: register on KWatch.io here.

Arthur
CTO at KWatch.io