██████╗ ██████╗ ██████╗ ███████╗██╗ ██╗
██╔════╝██╔═══██╗██╔══██╗██╔════╝╚██╗ ██╔╝
██║ ██║ ██║██║ ██║█████╗ ╚████╔╝
██║ ██║ ██║██║ ██║██╔══╝ ╚██╔╝
╚██████╗╚██████╔╝██████╔╝███████╗ ██║
╚═════╝ ╚═════╝ ╚═════╝ ╚══════╝ ╚═╝ ─ V2
v2.0.0 · Local AI Coding Assistant · Termux
Codey-v2: A persistent, fully local AI coding agent that runs in Termux on your Android phone — with daemon mode, RAG, git tools, voice, and self-refinement. No cloud required.
A persistent, daemon-based AI coding agent that runs entirely on your Android device. CODEY-V2 maintains state across sessions, manages a background task queue, and uses three purpose-built models — a 7B primary agent, a 0.5B planner and summarizer, and a dedicated embedding encoder — all served locally via llama.cpp.
Security notice: CODEY-V2 executes shell commands and writes files based on model output. Read the security guide before use.
In a world full of powerful cloud-based AI coding tools (like multi-agent orchestration layers for Claude Code), Codey-v2 takes a different path:
- Truly offline & private — Runs 100% locally on your Android phone using small models via llama.cpp. No internet, no API keys, no data leaving your device.
- Mobile-first — Built for Termux. Start a persistent daemon and vibe-code from anywhere: commute, couch, bed, or while traveling.
- Lightweight & practical — Smart thermal management, voice input, git integration, RAG, and a built-in escalation pipeline (automatically asks your installed Claude Code / Qwen CLI / Gemini CLI for help when stuck).
- Hybrid when you want it — Optional OpenRouter fallback for heavier tasks — everything is pre-configured.
Codey-v2 isn't trying to replace desktop cloud super-agents. It's your pocket coding companion for when you want freedom, privacy, and zero dependency.
If you value coding on the go without burning subscriptions or sending code to the cloud, this is for you.
# 1. Clone and enter the repo
git clone https://github.com/Ishabdullah/Codey-v2.git && cd Codey-v2
# 2. Run the installer (downloads models, builds llama.cpp, sets PATH)
./install.sh
# 3. Start all three model servers and the background daemon
codeyd2 start
# 4. Send your first task
codey2 "add a docstring to every function in utils.py"
# 5. Check daemon health at any time
codeyd2 statusSee docs/installation.md for manual setup and model download links.
# 1. Clone and install Python dependencies
git clone https://github.com/Ishabdullah/Codey-v2.git && cd Codey-v2
pip install -r requirements.txt
# 2. Set your API key (get one at https://openrouter.ai/keys)
export OPENROUTER_API_KEY="sk-or-your-key-here"
# 3. Switch to the OpenRouter backend
export CODEY_BACKEND="openrouter"
# 4. (Optional) Choose a model — default is qwen/qwen-2.5-coder-7b-instruct
export OPENROUTER_MODEL="anthropic/claude-sonnet-4-5"
# 5. Run a task
python main.py "refactor my sort function to use timsort"To make env vars permanent, add them to ~/.bashrc and run source ~/.bashrc.
Any model slug from openrouter.ai/models works. You can also mix backends — run the planner locally while routing coding calls to OpenRouter:
export CODEY_BACKEND="openrouter" # coding → OpenRouter
export CODEY_BACKEND_P="local" # planner → local 0.5B (port 8081)Codey-v2 generating a Fibonacci sequence implementation entirely on-device — no cloud, no internet, running in Termux on Android.
- Rebranded to CODEY-V2 — clean CLI banner in blue, unified name across all interfaces
- Malformed JSON recovery — relaxed parser now handles unquoted values emitted by smaller models, eliminating silent tool-call failures
- Shell safety hardened — dangerous command detection expanded to catch
find -delete,git reset --hard,git push --force, and indirect execution viash -c/bash -c - Peer code extraction improved — fuzzy filename matching in peer output now handles
### File: x.pyandFile: x.pyheading styles in addition to bold/backtick patterns - Unified planning interface —
core/planner_service.pyconsolidates daemon (0.5B) and orchestrator (7B) planning paths into a single entry point - Memory system cleaned up — all callers now import directly from
core/memory_v2.py; the legacy shim has been removed - LRU eviction threshold fixed — aligned to 3 turns (was incorrectly set to 6, causing memory bloat)
- Codebase pruned — removed legacy
core/loader.py,core/router.py, outdated audit reports, and old plan documents
| Model | Port | Role |
|---|---|---|
| Qwen2.5-Coder-7B Q4_K_M | 8080 | Primary agent — coding, reasoning, tool use |
| Qwen2.5-0.5B Q8_0 | 8081 | Task planning and conversation summarization |
| nomic-embed-text-v1.5 Q4 | 8082 | RAG retrieval encoder |
All three run as independent llama-server processes, managed and watchdog-monitored by codeyd2.
- Persistent daemon — runs continuously in the background; state survives restarts
- Task queue — complex requests broken into steps and executed sequentially
- RAG retrieval — local knowledge base searched on every inference call; relevant docs injected automatically
- Recursive self-refinement — draft → critique → refine cycle catches bugs before they hit your files
- Error recovery — adaptive strategy switching when tools fail (write → patch, import error → install, etc.)
- Peer CLI escalation — delegates work to Claude Code, Gemini CLI, or Qwen CLI either on-demand ("ask Claude to X") or automatically when CODEY-V2 exhausts its retry budget. The peer receives current project file contents and returns complete, ready-to-apply code blocks that CODEY-V2 writes to disk. Requires explicit user consent before any files are shared (external services — see Security)
- Git integration — branch management, AI commit messages, conflict detection and resolution
- Voice interface — TTS output and STT input via Termux:API
- Static analysis — auto-lint on every Python write;
/reviewcommand for on-demand scans - Thermal management — monitors CPU load and battery; reduces threads automatically under stress
- Fine-tuning — export your interaction history and train a personalized adapter on Google Colab
| Guide | Contents |
|---|---|
| Installation | Requirements, one-line install, manual step-by-step |
| Commands | Full reference: codeyd2, codey2, slash commands, flags, env vars |
| Configuration | Config JSON, model tuning, context management, thermal settings |
| Architecture | System diagram, memory tiers, project structure, Python API |
| Knowledge Base | Setting up RAG, indexing docs, skill repos |
| Fine-tuning | Export data, Colab training, import adapter, rollback |
| Pipeline | Training data pipeline — build fine-tuning datasets from HuggingFace + synthetic data |
| Security | Risks, mitigations, hardening summary, reporting vulnerabilities |
| Troubleshooting | Common issues, performance reference, known limitations |
| Version History | Full changelog |
| Platform | Termux on Android, or any Linux system |
| RAM | 6 GB+ available |
| Storage | ~6 GB base (7B model ~4.2 GB, 0.5B ~500 MB, embed ~80 MB, toolchain ~1 GB); ~8 GB with training pipeline |
| Python | 3.12+ |
- Fork the repository
- Create a feature branch
- Make your changes and run the tests (
pytest tests/ -v) - Submit a pull request
Bug reports, security disclosures, and hardening contributions are especially welcome.
- llama.cpp — efficient on-device LLM inference
- Qwen — Qwen2.5-Coder models
- nomic-ai — nomic-embed-text embedding model
- Codey v1 — the original session-based agent this builds on
MIT License
If Codey helps you code on the go, consider starring ⭐ the repo — it helps other Android developers find this project!

