Fake News Detection WhatsApp Bot

Ayush Basral
8 min readApr 16, 2021

If you like my work you can buy me a coffee here

What is fake news?

“Fake news” is a term that has come to mean different things to different people. At its core, we are defining “fake news” as those news stories that are false: the story itself is fabricated, with no verifiable facts, sources or quotes. Sometimes these stories may be propaganda that is intentionally designed to mislead the reader, or may be designed as “clickbait” written for economic incentives (the writer profits on the number of people who click on the story). In recent years, fake news stories have proliferated via social media, in part because they are so easily and quickly shared online. Fake news is an invention — a lie created out of nothing — that takes the appearance of real news with the aim of deceiving people. This is what is important to remember: the information is false, but it seems true.

According to 30seconds.org:

“Fake news” is a term used to refer to fabricated news. Fake news is an invention — a lie created out of nothing — that takes the appearance of real news with the aim of deceiving people. This is what is important to remember: the information is false, but it seems true.

According to Wikipedia:

“Fake news (also known as junk news, pseudo-news, or hoax news) is a form of news consisting of deliberate disinformation or hoaxes spread via traditional news media (print and broadcast) or online social media.” The usage of the web as a medium for perceiving information is increasing daily. The amount of information loaded in social media at any point is enormous, posing a challenge to the validation of the truthfulness of the information. The main reason that drives this framework is that on an average 62% of US adults rely on social media as their main source of news. The quality of news that is being generated in social media has substantially reduced over the years. The generation of fake news is intentional by the unknown sources which are trivial, and there are existing methodologies to individually validate the users’ trustworthiness, the truthfulness of the news and user

These are four common types of fake news:

  1. Targeted misinformation: Fictitious piece of information shared for self-serving interests. Targeted misinformation is often directed at groups that are most susceptible to receiving this type of information and easily accept and share polarizing content without verifying its authenticity.
  2. Fake headlines: Headlines depicting fictitious facts to generate attention. These are regularly employed by less credible publications such as tabloid newspapers. Readers often quickly realize that the content of the article does not match the headline. Their titles are referred to as “clickbait headlines.”
  3. Viral posts: There’s a plethora of new articles and content on social media networks. As a consequence, users often do not take the time to authenticate posts. Because large social networks favor shares, likes, and followers, popular posts are shown more often in a user’s threat — even if that content is fake news.
  4. Satire: Satirical news pick up on current affairs and news items and mix them with fictitious, and often absurd events. Satire is often employed to raise awareness of social issues or criticize political wrongdoing. But there’s always the danger that humorous components go undetected and the pieces are considered to be true.

Fake news has become a huge issue in our digitally-connected world and it is no longer limited to little squabbles — fake news spreads like wildfire and is impacting millions of people every day.

How do you deal with such a sensitive issue? Countless articles are being churned out every day on the internet — how do you tell real from fake? It’s not as easy as turning to a simple fact-checker which is typically built on a story-by-story basis. As developers, can we turn to machine learning?

So in this article we will be following a more traditional supervised approach of detecting fake news by training a model on labelled data and will use Twilio WhatsApp API to infer from our model.

Requirements:

  1. A Twilio account 
  2. A Twilio whatsapp sandbox 
  3. Python 3 
  4. Flask 
  5. ngork 
  6. Tensorflow

Let’s build:

We are using the LIAR Dataset by William Yang Wang which he used in his research paper titled “Liar, Liar Pants on Fire”: A New Benchmark Dataset for Fake News Detection. The original dataset comes with following columns:

Column 1: the ID of the statement ([ID].json)

Column 2: the label 

Column 3: the statement 

Column 4: the subject(s) 

Column 5: the speaker 

Column 6: the speaker’s job title 

Column 7: the state info 

Column 8: the party affiliation

Column 9–13: the total credit history count, including the current statement

9: barely true counts

10: false counts

11: half true counts

12: mostly true counts

13: pants on fire counts 

Column 14: the context (venue / location of the speech or statement)

For the simplicity, we have converted it to 2 column format: 

Column 1: Statement (News headline or text)

Column 2: Label (Label class contains: True, False)

Let’s start building a Machine Learning model:

Step 1: Preprocessing

Data preprocessing is a process of preparing the raw data and making it suitable for a machine learning model. It is the first and crucial step while creating a machine learning model. When creating a machine learning project, it is not always a case that we come across clean and formatted data. And while doing any operation with data, it is mandatory to clean it and put it in a formatted way. So for this, we use data preprocessing tasks.

The file preprocessing.py contains all the preprocessing functions needed to process all input documents and texts. First, we read the train, test, and validation data files then performed some preprocessing like tokenizing, stemming etc. There are some exploratory data analysis is performed like response variable distribution and data quality checks like null or missing values etc.

Step 2: Feature Selection

For feature selection, we have used methods like simple bag-of-words and n-grams and then term frequency like tf-idf weighting. we have also used word2vec and POS tagging to extract the features, though POS tagging and word2vec has not been used at this point in the project.

Step 3: Classification

Here we have built all the classifiers for predicting the fake news detection. The extracted features are fed into different classifiers. We have used Naive-bayes, Logistic Regression, Linear SVM, Stochastic gradient descent, and Random forest classifiers from sklearn. Each of the extracted features was used in all of the classifiers. Once fitting the model, we compared the f1 score and checked the confusion matrix. After fitting all the classifiers, 2 best performing models were selected as candidate models for fake news classification.

We have performed parameter tuning by implementing GridSearchCV method on these candidate models and chosen best performing parameters for these classifiers. Finally the selected model was used for fake news detection with the probability of truth. In Addition to this, We have also extracted the top 50 features from our term-frequency tf-idf vectorizer to see what words are most important in each of the classes. We have also used Precision-Recall and learning curves to see how training and test sets perform when we increase the amount of data in our classifiers.

Step 4: Prediction

Our finally selected and best performing classifier was Logistic Regression which was then saved on disk with name final_model.sav. Once you close this repository, this model will be copied to the user’s machine and will be used by prediction.py file to classify the fake news. It takes a news article as input from the user then a model is used for final classification output that is shown to the user along with probability of truth.

Step 5: Integrating Twilio WhatsApp API

We have to accept a news article headline or text from Twilio WhatsApp API and save it to our model for prediction. For this, we will python flask API server.

Now you have to generate an endpoint which can be accessed using Twilio WhatsApp Sandbox.

Your Flask app will need to be visible from the web so Twilio can send requests to it. Ngrok lets us do this. With it installed, run the following command in your terminal in the directory your code is in. Run ngrok http 5000 in a new terminal tab.

Grab that ngrok URL to configure twilio whatsapp sandbox. We will try this on WhatsApp! So let’s go ahead and do it (either on our Sandbox if you want to do testing or your main WhatsApp Sender number if you have one provisioned).

When you have completed these steps we are good to go:

Output

You can refer to the project from out GitHub repository : https://github.com/ABasral/Fake-News-Detection-Whatsapp-Bot

If you like my work you can buy me a coffee here

What are the consequences of fake news on society?

Once celebrated as a democratic medium, the World Wide Web has gained a bad reputation when it comes to the reliability of information. That’s because anyone can create, share, and manipulate information online. And with a growing majority of people using online media as their primary source, fake news presents a huge challenge. On the one hand, democracy thrives because of freely accessible information, which helps us understand political, societal, and economical connections. On the other hand, fake news fosters mistrust and skeptical thinking, and hinders discussions or conflict resolution.

--

--