Sentiment Analysis of Trump's Twitter

This project will analyze the sentiment of President Trump's tweets and their impact on his approval ratings using natural language processing algorithms and techniques.

Introduction

This unbiased, objective research examined the relationship between President Donald J. Trump's Twitter sentiment and his presidential approval ratings. Understanding how Twitter affected Trump's presidency holds implications for future Presidents and provides insight into public opinion. Through regression analysis, no significant relationship was found between Trump's average tweet sentiment or any tweet subsets and his approval ratings. The following is a summary of my master's dissertation.

Background

The internet has revolutionized how presidents manage their terms. Bush and Gore first utilized email to sway the populous in their favor in 2009. Today, President Trump has used Twitter to break news, feud with critics, and share opinions. Through his 16,433 original tweets, Trump has generated immense media coverage and left researchers pondering the impact of such presidential conversations.

Donald Trump's use of Twitter during his 2016 presidential campaign was unlike any other before it. Despite being at a disadvantage in endorsements and money raised, Trump was still able to secure his position as the 45th president of the United States, likely due in part to his use of Twitter. Since taking office, Trump's Twitter use has only increased. This research aims to uncover the impact of Trump's Twitter sentiment on his approval ratings. The Analysis will use sentiment analysis and regression analysis to test the different subsets of Trump's tweets, identified by automatic keyword extraction. This is essential information for future presidential candidates and will allow for an understanding of what affects a president's approval ratings.

Literature Review

Text Data is semi-structured or unstructured data and makes up 80% of all electronic data. This data has become highly valued by most industries and has spawned many advanced algorithms to extract insights from it, such as sentiment analysis. With the rise of social media platforms, there has been a massive surge of textual data, particularly from Donald J. Trump's presidency. Academics have been trying to understand the relationship between Trump’s Twitter use and other variables, such as approval ratings, financial markets, and polarization.

Sentiment analysis is a subgroup of text analytics used to discover the hidden emotion behind textual data. It is commonly used with Twitter data and enables researchers to uncover hidden trends and factors in tweets that contribute to Twitter user's influence and popularity. Sentiment analysis algorithms can detect emotion in text, such as positive and negative, with a reasonable degree of success.

The lexicon dictionary used to evaluate the sentiment of the text is vital to the analysis. For sentiment analysis, a standard dictionary-based approach is used, although they have limitations.

Keyword extraction is used to identify relevant information and topics in documents. The algorithm is capable of identifying keywords by examining the frequency, centrality, position, and strength of neighboring words in text. Many different industries use keyword extraction, such as customer review analysis and microblogging analysis.

Time series analysis was used to analyze the effect of Trump's Twitter sentiment on his approval ratings. Time series data is dependent on specific points in time, and microblogging time series are useful because of their reliably time-stamped, ease of use, and public accessibility. The average sentiment method was used to create a time series of Trump’s average sentiment.

Regression analysis was used to analyze the impact that Trump’s Twitter had on his approval ratings. Regression analysis is a statistical technique that can analyze the relationship between a dependent and independent variable or predict a change in the dependent variable given a change in the independent variable. This technique was used to measure the correlation between Trump’s average Twitter sentiment and his approval ratings.

Methodology

Cross-Industry Standard Process for Data Mining (CRISP-DM) is a widely used research methodology for data mining that provides structure, consistency, repeatability, and objectiveness. It is also a viable methodology for text mining due to its ability to incorporate text mining techniques such as word frequencies, keyword extraction, and theme visualizing. Silva, Prado, and Ferneda (2002) argued that CRISP-DM can be used for text mining projects and demonstrated its use with a project encompassing the aforementioned techniques. This research applies CRISP-DM to analyze the differences between data mining and text mining.

This study employed the CRISP-DM framework to demonstrate how the specified aims could be fulfilled. In particular, the methodology elucidated the use of sentiment analysis, time series analysis, keyword extraction, and regression analysis to assess the impact of President Trump's Twitter activity on his approval ratings.

Analysis

This study demonstrates that the regression models for the tweet subsets regarding the Democratic party, border security, and news and fake news were all found to be insignificant predictors of President Trump's approval ratings, with the respective p-values exceeding 0.05. Additionally, all models displayed low F-statistics and R-squared values, as well as similar standard errors. In this case, the average distance of the data points from the regression line was approximated to be 0.27 approval rating points.

Conclusion

This study examined the correlation between Donald Trump's approval ratings and his 16,433 presidential tweets. Using sentiment analysis, keyword extraction, and regression analysis, the study concluded that Trump's Twitter sentiment had no significant effect on his approval ratings. The p-value of the initial regression model was 0.832, meaning that the sentiment of Trump's tweets was not statistically significant. Additionally, when the tweets were broken down into topics such as border security, the Democratic party, and news and fake news, the p-value was still greater than 0.05, meaning the sentiment of Trump's tweets was still not a significant predictor of his approval ratings. Future research should use more advanced techniques such as topic modeling to increase the robustness of the analysis, as well as consider additional variables such as the Dow Jones Industrial Average, unemployment rate, and COVID-19 death rate. Additionally, the study could also examine how Trump's approval ratings and Twitter sentiment correlate in specific periods.

GitHub Project

Project Gallery

GC