Multi-Model AI Ensemble powered by OpenRouter — autonomous trading for Kalshi prediction markets.
Five frontier LLMs debate every trade. The system only enters when they agree.
Bring any model via OpenRouter. Swap models with one config change.
Quick Start · Features · How It Works · Configuration · Contributing · Kalshi API Docs
⚠️ Disclaimer — This is experimental software for educational and research purposes only. Trading involves substantial risk of loss. Only trade with capital you can afford to lose. Past performance does not guarantee future results. This software is not financial advice. The authors are not responsible for any financial losses incurred through the use of this software.
📊 Why Guardrails Exist — Through extensive live trading on Kalshi across multiple strategies, we learned that trading without category enforcement and risk guardrails leads to significant losses. The most common mistakes: over-allocating to economic events (CPI, Fed decisions) with no real edge, and using aggressive position sizing. The consistently profitable edge we found was NCAAB NO-side trading (74% win rate, +10% ROI). This repo ships with discipline systems enabled by default — category scoring, portfolio enforcement, and sane risk parameters.
Three steps to get running in paper-trading mode (no real money):
# 1. Clone and set up
git clone https://github.com/ryanfrigo/kalshi-ai-trading-bot.git
cd kalshi-ai-trading-bot
python setup.py # creates .venv, installs deps, checks config
# 2. Add your API keys
cp env.template .env # then open .env and fill in KALSHI_API_KEY
# and OPENROUTER_API_KEY
# 3. Run the AI ensemble (default — 5-model debate on every trade)
python cli.py run --paper
# Or run the safe compounder (conservative, math-only, NO-side edge-based)
python cli.py run --safe-compounderThen open the live dashboard in another terminal:
python cli.py dashboardNeed API keys?
- Kalshi key + private key → kalshi.com/account/settings (API docs)
- OpenRouter key → openrouter.ai (Claude Sonnet 4.5, GPT-5.4, Gemini 3.1 Pro, DeepSeek V3.2, Grok 4.1 Fast — all five models via one key)
- ✅ Five frontier LLMs collaborate on every decision — Claude Sonnet 4.5, GPT-5.4, Gemini 3.1 Pro, DeepSeek V3.2, Grok 4.1 Fast
- ✅ Role-based specialization — each model plays a distinct analytical role (lead analyst, forecaster, risk manager, bull/bear researcher)
- ✅ OpenRouter unified access — all five models route through OpenRouter; single API key, swap any model with one config line
- ✅ Consensus gating — positions are skipped when models diverge beyond a configurable confidence threshold
- ✅ Deterministic outputs — temperature=0 for reproducible AI reasoning
- ✅ AI Ensemble (default) — five-model debate with Kelly Criterion sizing and portfolio guardrails
- ✅ Safe Compounder — pure edge-based NO-side strategy; near-certain outcomes only, no AI required
- ✅ Fractional Kelly position sizing (0.25x Kelly for volatility control)
- ✅ Hard daily loss limit — stops trading at 10% drawdown
- ✅ Max drawdown circuit breaker — halts at 15% portfolio drawdown
- ✅ Sector concentration cap — no more than 30% in any single category
- ✅ Category scoring — hard-blocks categories with proven negative edge
- ✅ Daily AI cost budget — stops spending when API costs hit the configurable daily limit (default: $10/day)
- ✅ Trailing take-profit at 20% gain
- ✅ Stop-loss at 15% per position
- ✅ Confidence-decay exits when AI conviction drops
- ✅ Time-based exits (10-day max hold)
- ✅ Volatility-adjusted thresholds
- ✅ Real-time Streamlit dashboard — portfolio value, positions, P&L, AI decision logs
- ✅ Paper trading mode — simulate trades without real orders; track outcomes on settled markets
- ✅ SQLite telemetry — every trade, AI decision, and cost metric logged locally
- ✅ Unified CLI —
run,dashboard,status,health,scores,historycommands
The bot runs a four-stage pipeline on a continuous loop:
INGEST DECIDE (5-Model Ensemble) EXECUTE TRACK
-------- ──────────────────────────────────── --------- --------
┌────────────────────────────────────┐
Kalshi ────────► │ Claude Sonnet 4.5 (Lead Analyst 30%)│
Events API ├────────────────────────────────────┤
│ Gemini 3.1 Pro (Forecaster 30%)│
WebSocket ────────► ├────────────────────────────────────┤
Stream │ GPT-5.4 (Risk Manager 20%)│ ──► Kalshi ──► P&L
├────────────────────────────────────┤ Order Win Rate
RSS / News ───────► │ DeepSeek V3.2 (Bull Case 10%)│ Router Sharpe
Feeds ├────────────────────────────────────┤ Drawdown
│ Grok 4.1 Fast (Bear Case 10%)│ Kelly Cost
Volume & ────────► └────────────────────────────────────┘ Sizing Budget
Price Data All via OpenRouter (single API key)
Debate → Consensus → Confidence
Market data is pulled via the Kalshi Events API (with nested markets) to get the full tradeable universe — not just the /markets endpoint, which only returns parlay tickers. Order book snapshots and RSS news feeds supplement the signal.
Each of the five models analyzes the incoming data from its assigned perspective and returns a probability estimate + confidence score. The ensemble combines weighted votes:
| Model | Role | Provider | Weight |
|---|---|---|---|
| Claude Sonnet 4.5 | Lead Analyst | OpenRouter | 30% |
| Gemini 3.1 Pro | Forecaster | OpenRouter | 30% |
| GPT-5.4 | Risk Manager | OpenRouter | 20% |
| DeepSeek V3.2 | Bull Researcher | OpenRouter | 10% |
| Grok 4.1 Fast | Bear Researcher | OpenRouter | 10% |
If the weighted confidence falls below min_confidence_to_trade (default: 0.45), the opportunity is skipped. If models disagree significantly, position size is automatically reduced.
Bring your own model — All five models are swappable with one config change in
src/config/settings.py. Any model on openrouter.ai/models works. Single API key for the entire fleet.
Qualifying trades are sized using the Kelly Criterion (fractional 0.25x) and routed through Kalshi's order API.
Every decision is written to a local SQLite database. The dashboard and CLI commands surface cumulative P&L, win rate, Sharpe ratio, and per-strategy breakdowns in real time.
- Python 3.12 or later
- A Kalshi account with API access (API docs)
- An OpenRouter API key — provides all five models (Claude Sonnet 4.5, GPT-5.4, Gemini 3.1 Pro, DeepSeek V3.2, Grok 4.1 Fast) via a single key
git clone https://github.com/ryanfrigo/kalshi-ai-trading-bot.git
cd kalshi-ai-trading-bot
python setup.pyThe setup script will:
- ✅ Check Python version compatibility
- ✅ Create virtual environment
- ✅ Install all dependencies (with Python 3.14 compatibility handling)
- ✅ Test that the dashboard can run
- ✅ Print troubleshooting guidance
git clone https://github.com/ryanfrigo/kalshi-ai-trading-bot.git
cd kalshi-ai-trading-bot
python -m venv .venv
source .venv/bin/activate # macOS / Linux
# .venv\Scripts\activate # Windows
# Python 3.14 users only:
export PYO3_USE_ABI3_FORWARD_COMPATIBILITY=1
pip install -r requirements.txtcp env.template .env # fill in your keys| Variable | Description |
|---|---|
KALSHI_API_KEY |
Your Kalshi API key ID |
OPENROUTER_API_KEY |
OpenRouter key — all five AI models route through this (Claude Sonnet 4.5, GPT-5.4, Gemini 3.1 Pro, DeepSeek V3.2, Grok 4.1 Fast) |
Place your Kalshi private key as kalshi_private_key (no extension) in the project root. Download from Kalshi Settings → API. This file is git-ignored.
python -m src.utils.database
⚠️ Use-mflag — runningpython src/utils/database.pydirectly will fail with a module import error.
# AI Ensemble — paper trading (no real orders — safe to test)
python cli.py run --paper
# AI Ensemble — live trading (real money)
python cli.py run --live
# Safe Compounder — conservative, math-only (dry run)
python cli.py run --safe-compounder
# Safe Compounder — live execution
python cli.py run --safe-compounder --live
# Launch monitoring dashboard
python cli.py dashboard
# Check portfolio balance and open positions
python cli.py status
# Verify all API connections
python cli.py health
# Show category scores (which markets are allowed)
python cli.py scoresOr invoke the bot script directly:
python beast_mode_bot.py # Paper trading
python beast_mode_bot.py --live # Live trading
python beast_mode_bot.py --dashboard # Dashboard modeSimulate trades without risking real money. Every signal is logged to SQLite and a static HTML dashboard renders cumulative P&L, win rate, and per-signal details after markets settle.
# Scan markets and log signals
python paper_trader.py
# Continuous scanning every 15 minutes
python paper_trader.py --loop --interval 900
# Settle markets and update outcomes
python paper_trader.py --settle
# Regenerate HTML dashboard
python paper_trader.py --dashboard
# Print stats to terminal
python paper_trader.py --statsThe dashboard writes to docs/paper_dashboard.html — open locally or host via GitHub Pages.
kalshi-ai-trading-bot/
├── beast_mode_bot.py # Main bot entry point (AI ensemble orchestrator)
├── cli.py # Unified CLI: run, dashboard, status, health, scores
├── paper_trader.py # Paper trading signal tracker
├── pyproject.toml # PEP 621 project metadata
├── requirements.txt # Pinned dependencies
├── env.template # Environment variable template
│
├── src/
│ ├── agents/ # Multi-model ensemble (lead analyst, bull/bear, risk, forecaster)
│ ├── clients/ # API clients (Kalshi, OpenRouter, WebSocket, model router)
│ ├── config/ # Settings and trading parameters
│ ├── data/ # News aggregation and sentiment analysis
│ ├── events/ # Async event bus for real-time streaming
│ ├── jobs/ # Core pipeline: ingest, decide, execute, track, evaluate
│ ├── strategies/ # Safe compounder, category scorer, portfolio enforcer
│ └── utils/ # Database, logging, prompts, risk helpers
│
├── scripts/ # Utility and diagnostic scripts
├── docs/ # Additional documentation + paper dashboard HTML
└── tests/ # Pytest test suite
All trading parameters live in src/config/settings.py:
# Position sizing
max_position_size_pct = 3.0 # Max 3% of balance per position
max_positions = 10 # Up to 10 concurrent positions
kelly_fraction = 0.25 # Quarter-Kelly (conservative)
# Market filtering
min_volume = 500 # Minimum contract volume
max_time_to_expiry_days = 14 # Trade contracts up to 14 days out
min_confidence_to_trade = 0.45 # Minimum ensemble confidence to enter
# AI settings — all models via OpenRouter
primary_model = "anthropic/claude-sonnet-4.5"
ai_temperature = 0 # Deterministic outputs
ai_max_tokens = 8000
# Ensemble models — all via OpenRouter (swap any with one line)
ensemble_models = {
"anthropic/claude-sonnet-4.5": {"provider": "openrouter", "role": "lead_analyst", "weight": 0.30},
"google/gemini-3.1-pro": {"provider": "openrouter", "role": "forecaster", "weight": 0.30},
"openai/gpt-5.4": {"provider": "openrouter", "role": "risk_manager", "weight": 0.20},
"deepseek/deepseek-v3.2": {"provider": "openrouter", "role": "bull_researcher","weight": 0.10},
"x-ai/grok-4.1-fast": {"provider": "openrouter", "role": "bear_researcher","weight": 0.10},
}
# Risk management
max_daily_loss_pct = 10.0 # Hard daily loss limit
daily_ai_cost_limit = 10.0 # Max daily AI API spend (USD) — default $10/day💸 Controlling AI spend
The bot checks daily spend limits before every AI API call. Once the limit is reached, all AI calls are skipped until the next calendar day.
Key knobs:
DAILY_AI_COST_LIMITenv var — max USD per day. Default: $10. Example:export DAILY_AI_COST_LIMIT=25scan_interval_seconds— how often the bot scans markets. Lower = more AI calls per hour.max_analyses_per_market_per_day— cap on AI analyses per individual market per day.
Five frontier LLMs debate every trade. Positions only open when models agree.
python cli.py run --paper # Paper trading (safe, no real money)
python cli.py run --live # Live tradingGuardrails active:
- Max drawdown: 15%
- Min confidence: 45%
- Max position size: 3% of portfolio
- Max sector concentration: 30%
- Kelly fraction: 0.25 (quarter-Kelly)
- Category scoring — blocks categories with negative edge history
The most conservative and historically validated strategy. No AI models required — pure math.
python cli.py run --safe-compounder # Dry run (shows opportunities)
python cli.py run --safe-compounder --live # Live executionStrategy rules:
- NO side ONLY — never buys YES
- YES last price must be ≤ 20¢ (near-certain NO outcome)
- NO ask must be > 80¢
- Edge (EV - price) must be > 5¢
- Places resting maker orders at
lowest_ask - 1¢(near-zero fees) - Max 10% of portfolio per position (half-Kelly sizing)
- Skips all sports, entertainment, and "mention" markets
This strategy is the closest thing to a pure edge play on Kalshi.
⚠️ Not recommended. Aggressive settings with no category guardrails have historically led to significant losses in live prediction market trading.
Available for comparison/research:
python cli.py run --beast --paper # Only run beast mode in paper tradingThe category scorer evaluates each Kalshi market category on a 0-100 scale and enforces allocation limits.
| Factor | Weight | Description |
|---|---|---|
| ROI | 40% | Average return on investment across all trades |
| Recent Trend | 25% | Direction of last 10 trades (recency-weighted) |
| Sample Size | 20% | More data = more confidence in the score |
| Win Rate | 15% | Percentage of winning trades |
| Score Range | Max Position Size | Status |
|---|---|---|
| 80-100 | 20% of portfolio | STRONG ✅ |
| 60-79 | 10% of portfolio | GOOD 🟢 |
| 40-59 | 5% of portfolio | WEAK 🟡 |
| 20-39 | 2% of portfolio | POOR 🟠 |
| 0-19 | 0% (blocked) | BLOCKED 🚫 |
Categories scoring below 30 are hard-blocked — the bot will not enter any trade in those categories regardless of AI confidence.
python cli.py scoresExample output:
======================================================================
CATEGORY SCORES
Category Score WR ROI Trades Alloc Status
------------------ ------ ------ -------- ------- ------ ----------
NCAAB 72.3 74% +10.0% 50 10% GOOD 🟢
NBA 41.2 52% +1.5% 28 5% WEAK 🟡
POLITICS 31.0 48% -8.0% 15 2% MARGINAL 🔴
CPI 8.4 25% -65.0% 20 0% BLOCKED 🚫
FED 12.1 32% -40.0% 25 0% BLOCKED 🚫
ECON_MACRO 10.5 30% -55.0% 40 0% BLOCKED 🚫
======================================================================
Every trade, AI decision, and cost metric is recorded to trading_system.db (local SQLite). Use the dashboard or scripts in scripts/ to review:
- Cumulative P&L and win rate
- Sharpe ratio and maximum drawdown
- AI confidence calibration
- Cost per trade and daily API budget utilization
- Per-strategy breakdowns
python cli.py history # Last 50 trades with category breakdown
python cli.py history --limit 100 # Last 100 tradespytest tests/ # full suite
pytest tests/ -v # verbose
pytest --cov=src # with coverageblack src/ tests/ cli.py beast_mode_bot.py
isort src/ tests/ cli.py beast_mode_bot.py
mypy src/Edit EnsembleConfig.models in src/config/settings.py:
# Example: swap DeepSeek V3.2 for a cheaper model
"deepseek/deepseek-v3": {"provider": "openrouter", "role": "bull_researcher", "weight": 0.10},Any model listed at openrouter.ai/models works. Weights should sum to 1.0. All models use the same OPENROUTER_API_KEY.
- Create a module in
src/strategies/ - Wire it into
src/strategies/unified_trading_system.py - Set allocation percentage in
src/config/settings.py - Add tests in
tests/
"no such table: positions" error on fresh install
The database file (trading_system.db) is not committed to the repo — it's created at runtime. On a fresh clone, if you run the dashboard or bot before the tables have been created, you'll see:
Error getting performance: no such table: positions
Fix: The bot now auto-initializes the database on startup. Just run it normally:
python cli.py run --paper
# or
python beast_mode_bot.pyIf you want to initialize the database manually (e.g., to verify the schema before starting the bot):
python -m src.utils.databaseThis creates all required tables: markets, positions, trade_logs, market_analyses, daily_cost_tracking, llm_queries, and analysis_reports.
AdGuard (macOS) blocks dependency downloads during setup
If you have AdGuard running as a system-level proxy on macOS, pip install may time out while downloading packages during python setup.py or pip install -r requirements.txt.
Fix: Temporarily disable AdGuard at the system level before running setup, then re-enable it after installation completes.
Note: AdGuard running as a browser extension only (not system-level) does not affect pip and requires no action.
Bot not placing live trades despite --live flag
Check logs for the mode confirmation string:
grep -i "live trading\|paper trading\|LIVE ORDER\|PAPER TRADE" logs/trading_system.log | tail -20"LIVE TRADING MODE ENABLED"→ correct"Paper trading mode"→ still in paper mode; verify API key has TRADING permissions in Kalshi Settings
Dashboard won't launch / import errors
Import errors in VS Code are IDE linter warnings, not runtime errors.
# Fix: activate venv, then run from project root
source .venv/bin/activate
python beast_mode_dashboard.pySet VS Code Python interpreter to .venv/bin/python via Cmd+Shift+P → Python: Select Interpreter.
Python 3.14 PyO3 compatibility error
# Quick fix
export PYO3_USE_ABI3_FORWARD_COMPATIBILITY=1
pip install -r requirements.txt
# Recommended: use Python 3.13
pyenv install 3.13.1 && pyenv local 3.13.1
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txtBot analyzing zero markets / only KXMVE tickers
The Kalshi /markets endpoint now only returns parlay tickers (KXMVE*). Real individual markets live under the Events API. The ingestion pipeline already uses the Events API with nested markets to fetch the full tradeable universe. If you see zero markets, check that your API key has read permissions and verify connectivity with python cli.py health.
Model not found / API errors
Model names on OpenRouter change periodically. Update EnsembleConfig.models in src/config/settings.py with the current model identifiers:
- OpenRouter models: openrouter.ai/models
You can also override the primary model via environment variable: export PRIMARY_MODEL=anthropic/claude-sonnet-4.5
After extensive live trading across multiple strategies, here's what the data taught:
The AI ensemble can be 80% confident on a CPI trade and still be wrong. Market-implied probabilities on economic releases are already efficient — there's no structural edge for a retail bot.
Fix: Category scoring now hard-blocks economic markets until they prove a positive edge over ≥5 trades.
A Kelly fraction of 0.75 sounds reasonable. It's not — it compounds losses catastrophically. At 0.75x Kelly with a 45% win rate, you can lose 80% of capital in a standard drawdown scenario.
Fix: Default is now 0.25x Kelly (quarter-Kelly), which is more conservative than most professional traders use.
A 50% drawdown limit means you can lose half your money before the bot stops. That's not a limit — it's a suggestion.
Fix: 15% max drawdown, with the circuit breaker actually stopping trades (not just logging a warning).
When 90% of capital is in economic categories and there's a Fed meeting, everything moves together.
Fix: 30% sector cap means no single category can dominate the portfolio.
The bot was scanning every 30 seconds and trading everything it found. More trades with no edge = faster path to zero.
Fix: 60-second scan interval. Trades only when confidence ≥ 45% AND category score ≥ 30.
Contributions are welcome! See CONTRIBUTING.md for full guidelines.
Quick steps:
- Fork the repository
- Create a feature branch:
git checkout -b feature/your-feature - Make changes, add tests, run
pytestandblack - Commit with conventional commit format:
feat: add new model weight config - Open a Pull Request
Good first issues: look for the good first issue label.
- Kalshi Trading API Docs
- Kalshi API Authentication
- Kalshi Markets Overview
- OpenRouter Model Catalog
- OpenRouter (unified access to all models)
This project is licensed under the MIT License. See LICENSE for details.
If this project is useful to you, consider giving it a ⭐
Made with ❤️ for the Kalshi trading community