NLP-BD-Attacks

NLP-BD-Attacks is a research-oriented toolkit designed for studying backdoor attacks in Natural Language Processing (NLP) models, particularly in federated learning and decentralized environments. This repository provides implementations of multiple attack strategies, transformer and LSTM-based model architectures, training and evaluation scripts, and dataset configurations focused on security vulnerabilities in NLP systems.

🎯 Project Objectives

💣 Simulate backdoor attacks in NLP models (e.g., text classification).
🔁 Support federated learning environments with client-specific poisoning.
📊 Provide tools to evaluate model behavior under poisoned vs. clean conditions.
🧠 Enable research and experimentation on secure and robust NLP.

📁 Repository Structure

│ ├── models/ # Model definitions 
│ ├── TransformerModel.py # Transformer for NLP 
│ ├── word_model.py # Word-based classifier 
│ ├── simple.py # Baseline models 
│ └── ... # ResNet/CIFAR-related residual models 
│ ├── utils/ # YAML-based poisoned word dictionaries 
│ ├── words_IMDB.yaml # Backdoor trigger words for IMDB dataset 
│ ├── words_reddit_*.yaml # Reddit triggers for GPT2, LSTM, Transformer 
│ └── text_load.py # Load text datasets and vocabulary 
│ ├── IMDB.py # IMDB dataset wrapper
├── main_training.py # Main model training script 
├── run_NLP_tasks.sh # Shell script for executing NLP tasks 
├── helper.py # Model and trigger handling 
├── test_funcs.py # Evaluation functions 
├── train_funcs.py # Training logic 
├── text_helper.py # Text preprocessing helpers 
├── write_script_nlp.py # Logging and reproducibility scripts 
└── README.md # Project documentation

⚙️ Installation

1. Clone the Repository

git clone https://github.com/Miraj-Rahman-AI/NLP-BD-Attacks.git
cd NLP-BD-Attacks

📈 Example: Results Snapshot

Setup	Trigger	Target	Clean Acc.	Backdoor Acc.
IMDB - 5 clients	quantumflux	1	91.3%	97.6%
Reddit - 10 clients	gnosiskey	3	87.8%	93.2%
Sent140 - 5 clients	lightform	0	89.5%	94.9%

🔍 Supported Features

🧠 Model Architectures

✅ Transformers (e.g., BERT-style and custom variants)
✅ LSTM / GRU / SimpleRNN
✅ Word-level CNN/MLP classifiers
✅ GPT2 (for Reddit dataset compatibility)

💥 Attack Capabilities

Trigger word insertion (via YAML)
Client-specific label poisoning
Task coverage: Sentiment classification, next-word prediction

🧰 Evaluation Tools

Clean vs. Poisoned Accuracy
Client-level behavior reports
Trigger injection effectiveness
Model confusion tracking

🧪 Ideal Use Cases

🎓 Graduate-level research on secure NLP
🔍 Investigating vulnerabilities in FL-based LLMs
🛡️ Designing and benchmarking backdoor defenses
🧬 Prototyping novel attack scenarios in NLP security

📚 References

Bagdasaryan, E., Veit, A., Hua, Y., Estrin, D., & Shmatikov, V.
How to Backdoor Federated Learning
In Proceedings of the 33rd Conference on Neural Information Processing Systems (NeurIPS), 2020.
Paper Link
Kurita, K., Michel, P., & Neubig, G.
Weight Poisoning Attacks on Pre-trained Models
In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL), 2020.
Paper Link
Salem, A., Bourtoule, L., Zhang, P., Rubinstein, B. I. P., & Papernot, N.
Dynamic Backdoor Attacks Against NLP Models
In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021.
Paper Link
Chen, J., Dai, X., Li, H., Xiao, X., & Chen, X.
BadNL: Backdoor Attacks Against NLP Models
In Proceedings of the 28th International Conference on Computational Linguistics (COLING), 2020.
Paper Link

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NLP-BD-Attacks

🎯 Project Objectives

📁 Repository Structure

⚙️ Installation

1. Clone the Repository

📈 Example: Results Snapshot

🔍 Supported Features

🧠 Model Architectures

💥 Attack Capabilities

🧰 Evaluation Tools

🧪 Ideal Use Cases

📚 References

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

NLP-BD-Attacks

🎯 Project Objectives

📁 Repository Structure

⚙️ Installation

1. Clone the Repository

📈 Example: Results Snapshot

🔍 Supported Features

🧠 Model Architectures

💥 Attack Capabilities

🧰 Evaluation Tools

🧪 Ideal Use Cases

📚 References