Undergraduate Projects

Predicting the topic, sentiment and popularity of news

We use the news data set from the UCI machine learning repository to predict the topic, sentiment and popularity of news. The data set, which contains 93,239 rows x 11 columns, is collected from three social platforms. Our objective is to build a model which can accurately predict the outcomes. We employ some text embeddings techniques that include TF-IDF, Word2Vec and Doc2Vec to preprocess the text data.  Additionally, we employ the Chi-square feature selection and SVD dimensionality reduction techniques to fit the data set into a home computer’s memory for training. For the training part, we use Naïve Bayes, Decision Tree, Neural Networks for predicting the topic, Ridge and Lasso Regression for predicting the sentiment, and lastly Support Vector Machine, Random Forest, K Nearest Neighbour and Voting Classifier for predicting popularity. For each built model, we also assess its performance and compare it with other models. Our models perform very well at predicting the topics and fairly well at predicting the popularity, however, very poor at predicting the sentiment.

Developed By:
My Le
Kelvin Phan
Chau Wong