Sentiment Analysis of COVID-19 Post-Vaccination Discourse in Bangladesh

Public sentiment moves vaccination rates. We collected ~42k social-media comments posted in Bangladesh between January and December 2022 — a language-mixed stream of Bengali, romanised Bengali (Banglish), and English — and asked a simple question: can we measure it cleanly enough that a policymaker would act on it?

The corpus

Scraped from public threads on Facebook, YouTube and Twitter using the official APIs, filtered to comments mentioning at least one of 18 vaccine-related keywords. Each comment was hand-labelled by three annotators into three classes: positive, negative, neutral. Inter-annotator agreement (Krippendorff’s $\alpha$ ) was 0.71 — workable, but not great, which foreshadows the upper bound we hit.

Class	Count	Share
Positive	14,872	35.5%
Negative	16,041	38.3%
Neutral	10,981	26.2%
Total	41,894	100%

A real challenge: 47% of comments mix two scripts within a single sentence (e.g. vaccine ta khub kharap — “the vaccine is very bad”). Standard English tokenisers fall over on these. We built a small transliteration step that normalises Banglish to one of the two source scripts before downstream processing.

Preprocessing pipeline

Strip URLs, mentions, emoji.
Normalise Banglish → Bengali script via a rule-based mapper.
Tokenise (bnlp_toolkit for Bengali, nltk for English).
Remove a custom stop-word list tuned to vaccine discourse.
Lemmatise (English) / stem (Bengali).

Models

We tried four families and compared on the same train / test split (80/20, stratified by class).

Classical baselines

A Logistic Regression on TF-IDF vectors and a Multinomial Naive Bayes — fast, interpretable, surprisingly hard to beat on short text.

Ensembles

Random Forest and an XGBoost stack on the same TF-IDF features.

Deep learning

LSTM with Word2Vec embeddings trained on a 1.4M-comment unlabelled Bengali corpus. The math is the usual one — for an input sequence $x_1, \ldots, x_T$ the LSTM cell at step $t$ computes:

\begin{aligned} i_t &= \sigma(W_i x_t + U_i h_{t-1} + b_i) \\ f_t &= \sigma(W_f x_t + U_f h_{t-1} + b_f) \\ o_t &= \sigma(W_o x_t + U_o h_{t-1} + b_o) \\ \tilde{c}_t &= \tanh(W_c x_t + U_c h_{t-1} + b_c) \\ c_t &= f_t \odot c_{t-1} + i_t \odot \tilde{c}_t \\ h_t &= o_t \odot \tanh(c_t) \end{aligned}

The final hidden state $h_T$ feeds a softmax head over the three classes. Training: Adam, $\eta = 10^{-3}$ , dropout 0.3, early stopping on validation loss with patience 5.

Results

Test accuracy by model

67.1 %

Naive Bayes

72.4 %

Logistic Reg.

74.2 %

Random Forest

75.6 %

XGBoost

78.8 %

LSTM + W2V

LSTM with Word2Vec wins, but the gap over a tuned XGBoost on TF-IDF is only ~3 points — for the deployment cost, that's a real trade-off.

Per-class performance (best model)

LSTM + Word2Vec — per-class F1

Positive

0.81

Negative

0.83

Neutral

0.71

The neutral class is the hardest. Most annotator disagreement happens here, and the model inherits that ambiguity.

Training curves

LSTM training & validation accuracy

Training Validation

Validation accuracy plateaus around epoch 20; further training over-fits despite dropout.

What the model gets wrong

Two failure modes dominate:

Sarcasm. Comments like “great, another miracle vaccine that doesn’t work” flip on the word “great” and our model leans positive.
Code-switching with sentiment-bearing English. A negative Bengali sentence with a positive English interjection (“vaccine ta nojor lage, but actually great move by govt”) confuses the embedding-level signal.

A transformer with a Bengali pretrain (BanglaBERT) would likely close most of this gap; on a small held-out audit it lands at 82.4% on the same split. The decision to ship LSTM was a deployment cost call — at inference time LSTM is ~12× cheaper.

What this is good for

Policy briefings — class share over time, by region, by topic.
Early-warning — a 10% week-over-week negative jump on a single keyword cluster (e.g. side-effect) is a real signal worth a press response.
Targeted public-health messaging — the negative class clusters cleanly into three sub-themes (efficacy, safety, religious objection); each wants a different counter-message.

The dataset is open and the code lives on GitHub. A follow-up paper on BanglaBERT vs LSTM is in progress.