NLP-BD-Attacks is a research-oriented toolkit designed for studying backdoor attacks in Natural Language Processing (NLP) models, particularly in federated learning and decentralized environments. This repository provides implementations of multiple attack strategies, transformer and LSTM-based model architectures, training and evaluation scripts, and dataset configurations focused on security vulnerabilities in NLP systems.
- 💣 Simulate backdoor attacks in NLP models (e.g., text classification).
- 🔁 Support federated learning environments with client-specific poisoning.
- 📊 Provide tools to evaluate model behavior under poisoned vs. clean conditions.
- 🧠 Enable research and experimentation on secure and robust NLP.
│ ├── models/ # Model definitions
│ ├── TransformerModel.py # Transformer for NLP
│ ├── word_model.py # Word-based classifier
│ ├── simple.py # Baseline models
│ └── ... # ResNet/CIFAR-related residual models
│ ├── utils/ # YAML-based poisoned word dictionaries
│ ├── words_IMDB.yaml # Backdoor trigger words for IMDB dataset
│ ├── words_reddit_*.yaml # Reddit triggers for GPT2, LSTM, Transformer
│ └── text_load.py # Load text datasets and vocabulary
│ ├── IMDB.py # IMDB dataset wrapper
├── main_training.py # Main model training script
├── run_NLP_tasks.sh # Shell script for executing NLP tasks
├── helper.py # Model and trigger handling
├── test_funcs.py # Evaluation functions
├── train_funcs.py # Training logic
├── text_helper.py # Text preprocessing helpers
├── write_script_nlp.py # Logging and reproducibility scripts
└── README.md # Project documentation
git clone https://github.com/Miraj-Rahman-AI/NLP-BD-Attacks.git
cd NLP-BD-Attacks| Setup | Trigger | Target | Clean Acc. | Backdoor Acc. |
|---|---|---|---|---|
| IMDB - 5 clients | quantumflux | 1 | 91.3% | 97.6% |
| Reddit - 10 clients | gnosiskey | 3 | 87.8% | 93.2% |
| Sent140 - 5 clients | lightform | 0 | 89.5% | 94.9% |
- ✅ Transformers (e.g., BERT-style and custom variants)
- ✅ LSTM / GRU / SimpleRNN
- ✅ Word-level CNN/MLP classifiers
- ✅ GPT2 (for Reddit dataset compatibility)
- Trigger word insertion (via YAML)
- Client-specific label poisoning
- Task coverage: Sentiment classification, next-word prediction
- Clean vs. Poisoned Accuracy
- Client-level behavior reports
- Trigger injection effectiveness
- Model confusion tracking
- 🎓 Graduate-level research on secure NLP
- 🔍 Investigating vulnerabilities in FL-based LLMs
- 🛡️ Designing and benchmarking backdoor defenses
- 🧬 Prototyping novel attack scenarios in NLP security
-
Bagdasaryan, E., Veit, A., Hua, Y., Estrin, D., & Shmatikov, V.
How to Backdoor Federated Learning
In Proceedings of the 33rd Conference on Neural Information Processing Systems (NeurIPS), 2020.
Paper Link -
Kurita, K., Michel, P., & Neubig, G.
Weight Poisoning Attacks on Pre-trained Models
In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL), 2020.
Paper Link -
Salem, A., Bourtoule, L., Zhang, P., Rubinstein, B. I. P., & Papernot, N.
Dynamic Backdoor Attacks Against NLP Models
In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021.
Paper Link -
Chen, J., Dai, X., Li, H., Xiao, X., & Chen, X.
BadNL: Backdoor Attacks Against NLP Models
In Proceedings of the 28th International Conference on Computational Linguistics (COLING), 2020.
Paper Link