Cadence is an evolutionary system that uses large language models to iteratively generate, mutate, and improve programs for hard computational problems.
Cadence treats code generation as an evolutionary loop. It samples parent programs, proposes child variants with an LLM, evaluates them on a fixed test suite, stores results, and feeds the best lessons back into future generations. The current implementation focuses on the Traveling Salesman Problem.
Most LLM coding workflows are one-shot. Cadence explores a more iterative approach where programs improve over generations through evaluation, selection, and mutation. That makes it useful both as a practical experiment and as a framework for studying program evolution with LLMs.
Active research project with working experiments, documentation, and a modular foundation for extending beyond TSP.
flowchart TD
A[Sample Parent Program] --> B[Build Prompt + Lesson]
B --> C[LLM Generation of Code Diffs]
C --> D[Apply Diff to Parent]
D --> E[Evaluate on Test Suite]
E --> F[Log to Database]
F --> G{Generation Complete}
G -->|Not Final| A
G -->|Final| H[Extract Lesson]
H --> B
The system evolves programs over generations using the following loop:
- Sample a parent program and its previously generated children.
- Construct a prompt that includes the parent, children, and instructions.
- Use an LLM to generate modified versions of marked code blocks.
- Apply the generated diffs to produce a child program.
- Evaluate the child program's performance on a fixed test suite.
- Log and store the program and its performance in a database.
- Periodically promote the best-performing program to guide future generations.
- Optionally mutate the instructions used in prompts to encourage better code.
- TSP solution evolution using only standard Python (no external math libraries)
- Multi-seed deterministic evaluation for stable cost metrics
- SQLite-backed storage of program generations and performance
- Parallel evaluation for faster feedback
- Meta-prompting: periodically updates instructions to steer LLM behavior
- Modular task abstraction to support other optimization problems in the future
- Architecture
- Key Features
- Getting Started
- Configuration
- Usage
- Directory Structure
- Citation
- Contributing
- License
Run the out-of-the-box examples:
# Hypothesis 1: Cost evolution
python run_h1_experiment.py --config_name h1_config
# Hypothesis 2: Scaling analysis
python run_h2_experiment.py --config_name h2_configResults (h1_results.png, h2_scaling_analysis.png) and JSON summaries will appear in the project root.
-
Clone the repo and enter directory:
git clone https://github.com/yash-srivastava19/cadence.git cd cadence -
Create and activate a virtual environment:
python -m venv .venv .venv/bin/Activate
-
Install dependencies (using
uvfor reproducible installs):uv sync
All experiment scripts leverage Hydra for flexible, YAML-driven configuration. Sample conf/h1_config.yaml:
SEEDS: 10
GENERATIONS: 20
LESSON_INTERVAL: 4
API_MAX_RETRIES: 3
API_TIMEOUT: 60
hydra:
run:
dir: . # write outputs to project root
output:
subdir: null # disable timestamped foldersOverride on the command line without editing YAML:
# Change number of seeds and interval at runtime
git checkout main
python run_h1_experiment.py SEEDS=5 LESSON_INTERVAL=2from src.tasks.tsp_task import TSPTask
from src.prompt_sampler import build
from src.llm import generate
from src.evolve import apply_diff
# Initialize problem with 10 cities
task = TSPTask(n_cities=10)
base_code = task.baseline_program
# Build a prompt without lessons
prompt = build((None, None, None, base_code, None), [], None)
# Call LLM to get diff
diffs = generate(prompt)
# Apply diff to generate a new child solution
child_code = apply_diff(base_code, diffs)
print("Baseline code:\n", base_code)
print("Evolved code:\n", child_code)from src.meta_prompting import get_lesson_from_history
# Assume 'logs' is a list of experiment entries with 'generation' and 'cost'
lesson = get_lesson_from_history(logs, previous_lesson=None)
print("Heuristic lesson:", lesson)Cadence provides a built-in Flask-based UI for live monitoring of experiments. Launch it with:
python ui/launch_ui.pyThen open your browser at http://localhost:5000 to explore real-time metrics, cost evolution plots, and logs.
cadence/
├── conf/ # Hydra configuration files
│ ├── h1_config.yaml
│ └── h2_config.yaml
├── src/ # Core library modules
│ ├── database.py
│ ├── evaluator.py
│ ├── evolve.py
│ ├── llm.py
│ ├── prompt_sampler.py
│ └── tasks/ # Problem definitions (TSP, etc.)
└── run_h1_experiment.py # Hypothesis 1 script
run_h2_experiment.py # Hypothesis 2 script
- All code blocks must be marked with
### START_BLOCKand### END_BLOCK. - Prompts are built to explicitly instruct the LLM to only change marked blocks.
- Evaluation is deterministic using seeded inputs.
- The project uses
uvfor reproducible dependency management and performance.
To make cadence work for problems beyond TSP, you can define your own custom tasks by implementing the Task interface. This makes the system problem-agnostic while keeping the core workflow intact.
Create a new Python file in src/tasks/, for example:
touch src/tasks/knapsack_task.pyEach task must subclass Task and implement the following:
from src.task import Task
class YourTask(Task):
@property
def function_name(self):
# Name of the function LLM is expected to generate
return "solve"
def generate_inputs(self, seed: int):
# Generate deterministic input using the seed
return ...
def evaluate(self, output, input_data) -> float:
# Return a numerical metric (lower is better)
return ...function_name: This must match the name of the function the LLM is expected to define.generate_inputs(seed): Generate problem input. This can be a list, tuple, or dict.evaluate(output, input_data): Accepts output from the evolved program and returns a numeric cost.
Import your task class and instantiate it:
from tasks.knapsack_task import KnapsackTask
task = KnapsackTask()Then pass it into the execute() function:
metric = execute(child_program_code, task)- Use only standard Python libraries (
math,itertools,re, etc.). - Keep test inputs deterministic via seeds.
- Define a cost metric that is meaningful, consistent, and scalar.
- Try to avoid relying on
randominside the generated programs themselves.
This project is licensed under the MIT License.
If you use Cadence in your research or projects, please cite:
@software{cadence2025,
author = {Yash Srivastava},
title = {{Cadence: Program Evolution via Large Language Models}},
year = {2025},
url = {https://github.com/yash-srivastava19/cadence},
version = {main}
}