CODEY-V2

  ██████╗ ██████╗ ██████╗ ███████╗██╗   ██╗
 ██╔════╝██╔═══██╗██╔══██╗██╔════╝╚██╗ ██╔╝
 ██║     ██║   ██║██║  ██║█████╗   ╚████╔╝
 ██║     ██║   ██║██║  ██║██╔══╝    ╚██╔╝
 ╚██████╗╚██████╔╝██████╔╝███████╗   ██║
  ╚═════╝ ╚═════╝ ╚═════╝ ╚══════╝   ╚═╝  ─ V2
  v2.0.0 · Local AI Coding Assistant · Termux

Codey-v2: A persistent, fully local AI coding agent that runs in Termux on your Android phone — with daemon mode, RAG, git tools, voice, and self-refinement. No cloud required.

A persistent, daemon-based AI coding agent that runs entirely on your Android device. CODEY-V2 maintains state across sessions, manages a background task queue, and uses three purpose-built models — a 7B primary agent, a 0.5B planner and summarizer, and a dedicated embedding encoder — all served locally via llama.cpp.

Security notice: CODEY-V2 executes shell commands and writes files based on model output. Read the security guide before use.

Why Codey-v2?

In a world full of powerful cloud-based AI coding tools (like multi-agent orchestration layers for Claude Code), Codey-v2 takes a different path:

Truly offline & private — Runs 100% locally on your Android phone using small models via llama.cpp. No internet, no API keys, no data leaving your device.
Mobile-first — Built for Termux. Start a persistent daemon and vibe-code from anywhere: commute, couch, bed, or while traveling.
Lightweight & practical — Smart thermal management, voice input, git integration, RAG, and a built-in escalation pipeline (automatically asks your installed Claude Code / Qwen CLI / Gemini CLI for help when stuck).
Hybrid when you want it — Optional OpenRouter fallback for heavier tasks — everything is pre-configured.

Codey-v2 isn't trying to replace desktop cloud super-agents. It's your pocket coding companion for when you want freedom, privacy, and zero dependency.

If you value coding on the go without burning subscriptions or sending code to the cloud, this is for you.

Quick Start

Local — on-device models (5 steps)

# 1. Clone and enter the repo
git clone https://github.com/Ishabdullah/Codey-v2.git && cd Codey-v2

# 2. Run the installer (downloads models, builds llama.cpp, sets PATH)
./install.sh

# 3. Start all three model servers and the background daemon
codeyd2 start

# 4. Send your first task
codey2 "add a docstring to every function in utils.py"

# 5. Check daemon health at any time
codeyd2 status

See docs/installation.md for manual setup and model download links.

OpenRouter — cloud inference, no local models (5 steps)

# 1. Clone and install Python dependencies
git clone https://github.com/Ishabdullah/Codey-v2.git && cd Codey-v2
pip install -r requirements.txt

# 2. Set your API key (get one at https://openrouter.ai/keys)
export OPENROUTER_API_KEY="sk-or-your-key-here"

# 3. Switch to the OpenRouter backend
export CODEY_BACKEND="openrouter"

# 4. (Optional) Choose a model — default is qwen/qwen-2.5-coder-7b-instruct
export OPENROUTER_MODEL="anthropic/claude-sonnet-4-5"

# 5. Run a task
python main.py "refactor my sort function to use timsort"

To make env vars permanent, add them to ~/.bashrc and run source ~/.bashrc.

Any model slug from openrouter.ai/models works. You can also mix backends — run the planner locally while routing coding calls to OpenRouter:

export CODEY_BACKEND="openrouter"    # coding → OpenRouter
export CODEY_BACKEND_P="local"       # planner → local 0.5B (port 8081)

Visuals

Fibonacci Demo — Codey-v2 in Action

Codey-v2 generating a Fibonacci sequence implementation entirely on-device — no cloud, no internet, running in Termux on Android.

What's New in v2.0.0

First Stable Release

Rebranded to CODEY-V2 — clean CLI banner in blue, unified name across all interfaces
Malformed JSON recovery — relaxed parser now handles unquoted values emitted by smaller models, eliminating silent tool-call failures
Shell safety hardened — dangerous command detection expanded to catch find -delete, git reset --hard, git push --force, and indirect execution via sh -c / bash -c
Peer code extraction improved — fuzzy filename matching in peer output now handles ### File: x.py and File: x.py heading styles in addition to bold/backtick patterns
Unified planning interface — core/planner_service.py consolidates daemon (0.5B) and orchestrator (7B) planning paths into a single entry point
Memory system cleaned up — all callers now import directly from core/memory_v2.py; the legacy shim has been removed
LRU eviction threshold fixed — aligned to 3 turns (was incorrectly set to 6, causing memory bloat)
Codebase pruned — removed legacy core/loader.py, core/router.py, outdated audit reports, and old plan documents

Capabilities

Three-Model Architecture

Model	Port	Role
Qwen2.5-Coder-7B Q4_K_M	8080	Primary agent — coding, reasoning, tool use
Qwen2.5-0.5B Q8_0	8081	Task planning and conversation summarization
nomic-embed-text-v1.5 Q4	8082	RAG retrieval encoder

All three run as independent llama-server processes, managed and watchdog-monitored by codeyd2.

Agent Features

Persistent daemon — runs continuously in the background; state survives restarts
Task queue — complex requests broken into steps and executed sequentially
RAG retrieval — local knowledge base searched on every inference call; relevant docs injected automatically
Recursive self-refinement — draft → critique → refine cycle catches bugs before they hit your files
Error recovery — adaptive strategy switching when tools fail (write → patch, import error → install, etc.)
Peer CLI escalation — delegates work to Claude Code, Gemini CLI, or Qwen CLI either on-demand ("ask Claude to X") or automatically when CODEY-V2 exhausts its retry budget. The peer receives current project file contents and returns complete, ready-to-apply code blocks that CODEY-V2 writes to disk. Requires explicit user consent before any files are shared (external services — see Security)
Git integration — branch management, AI commit messages, conflict detection and resolution
Voice interface — TTS output and STT input via Termux:API
Static analysis — auto-lint on every Python write; /review command for on-demand scans
Thermal management — monitors CPU load and battery; reduces threads automatically under stress
Fine-tuning — export your interaction history and train a personalized adapter on Google Colab

Documentation

Guide	Contents
Installation	Requirements, one-line install, manual step-by-step
Commands	Full reference: `codeyd2`, `codey2`, slash commands, flags, env vars
Configuration	Config JSON, model tuning, context management, thermal settings
Architecture	System diagram, memory tiers, project structure, Python API
Knowledge Base	Setting up RAG, indexing docs, skill repos
Fine-tuning	Export data, Colab training, import adapter, rollback
Pipeline	Training data pipeline — build fine-tuning datasets from HuggingFace + synthetic data
Security	Risks, mitigations, hardening summary, reporting vulnerabilities
Troubleshooting	Common issues, performance reference, known limitations
Version History	Full changelog

Requirements


Platform	Termux on Android, or any Linux system
RAM	6 GB+ available
Storage	~6 GB base (7B model ~4.2 GB, 0.5B ~500 MB, embed ~80 MB, toolchain ~1 GB); ~8 GB with training pipeline
Python	3.12+

Contributing

Fork the repository
Create a feature branch
Make your changes and run the tests (pytest tests/ -v)
Submit a pull request

Bug reports, security disclosures, and hardening contributions are especially welcome.

Acknowledgments

llama.cpp — efficient on-device LLM inference
Qwen — Qwen2.5-Coder models
nomic-ai — nomic-embed-text embedding model
Codey v1 — the original session-based agent this builds on

MIT License

If Codey helps you code on the go, consider starring ⭐ the repo — it helps other Android developers find this project!

Name		Name	Last commit message	Last commit date
Latest commit History 98 Commits
assets		assets
core		core
docs		docs
gui		gui
pipeline		pipeline
prompts		prompts
tests		tests
tools		tools
utils		utils
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
README.md.bak.20260331_221812		README.md.bak.20260331_221812
TODO.md		TODO.md
TODO2.md		TODO2.md
codey-v2-tools-embed.md		codey-v2-tools-embed.md
codey2		codey2
codeyd2		codeyd2
importantdoc.md		importantdoc.md
install.sh		install.sh
main.py		main.py
requirements.txt		requirements.txt
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CODEY-V2

Why Codey-v2?

Quick Start

Local — on-device models (5 steps)

OpenRouter — cloud inference, no local models (5 steps)

Visuals

Fibonacci Demo — Codey-v2 in Action

What's New in v2.0.0

First Stable Release

Capabilities

Three-Model Architecture

Agent Features

Documentation

Requirements

Contributing

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CODEY-V2

Why Codey-v2?

Quick Start

Local — on-device models (5 steps)

OpenRouter — cloud inference, no local models (5 steps)

Visuals

Fibonacci Demo — Codey-v2 in Action

What's New in v2.0.0

First Stable Release

Capabilities

Three-Model Architecture

Agent Features

Documentation

Requirements

Contributing

Acknowledgments

About

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages