Skip to content

Release 1.0.0a1#9

Open
github-actions[bot] wants to merge 8 commits intomasterfrom
release-1.0.0a1
Open

Release 1.0.0a1#9
github-actions[bot] wants to merge 8 commits intomasterfrom
release-1.0.0a1

Conversation

@github-actions
Copy link
Copy Markdown

Human review requested!

JarbasAl and others added 8 commits October 25, 2024 23:23
…, normalisation (#8)

* feat: replace setup.py with pyproject.toml

* fix: remove_intent dict key, no-match shape, duplicate guard, E741 rename

* docs: type hints and docstrings

* test: comprehensive test suite

* docs: rewrite README

* perf/fix: cache regexes, word-count penalty, fix plural hack, tie-breaking

- Pre-compile all regexes at add_intent() time; removed per-query re.compile()
- lru_cache on word_tokenize calls to avoid repeated tokenization
- Replace character-length remainder penalty with word-count fraction
- Fix plural candidate detection to use word-boundary regex instead of
  substring check (prevents "status" being dropped due to "statuses")
- Fix regex slot confidence: divide by n_required not len(matches)
- Add deterministic tie-breaking: lower remainder word count wins, then
  alphabetical intent name
- Update test expected values to match new (more accurate) confidence scores

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat: context gating, keyword exclusion, normalisation, opm.py, intent_names

- bracket_expansion.py: add drop_apostrophes, normalize_whitespace,
  normalize_utterance, normalize_example — training samples and queries
  are now normalised identically at registration/match time
- __init__.py: apply normalize_example to training data at add_intent(),
  apply normalize_utterance to query in calc_intents(); add full context
  gating API (set/unset/require/unrequire/exclude/unexclude_context);
  add exclude_keywords() with word-boundary safety; add intent_names property
- opm.py: new OVOS ConfidenceMatcherPipeline plugin with lru_cache(128),
  session blacklist support, match_high/medium/low, bus listeners for
  padatious:register_intent, detach_intent, detach_skill, mycroft.skills.train
- pyproject.toml: add ovos optional-dependencies group, pipeline entry point
- test: 19 new tests covering normalisation, context gating, keyword exclusion,
  intent_names

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat: replace plural hack with lemmatize() helper

Add lemmatize(word) to bracket_expansion.py: strips apostrophes entirely
and removes trailing 's' (not 'ss') for language-agnostic plural matching.
Apply in _match() (replaces the old regex-based plural/singular hack) and
in get_utterance_remainder() (lemmatized token comparison so plural forms
of matched keywords are consumed from the remainder).

"lights" now matches training keyword "light", "what s" tokens (from
apostrophe normalisation) match "whats" via shared stem "what".

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: apostrophes → space in lemmatize, not empty string

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat: accuracy engine, benchmark, normalisation, opm rewrite, CI workflows

- Three-pass keyword matching (contiguous → lemma-normalised → non-contiguous)
- Non-contiguous match quality 0.8 so direct hits always win
- Require all required slots to fire; eliminates partial-required FPs
- _score() helper: remainder penalty, coverage bonus, slot bonus; 4dp rounding
- lemma_query computed once per calc_intents call; fused required+optional loop
- lemmatize() exported; apostrophes → space before lemmatization
- normalize_utterance/normalize_example applied at registration and match time
- opm.py rewritten for Adapt bus events (register_vocab/register_intent)
- benchmark/ package: 284-case dataset, accuracy.py, compare.py (vs Adapt)
- README updated with benchmark table (TN/NM column, honest FP commentary)
- Standard CI workflows added

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* ci: update workflows to standard — explicit secrets, lint, Python 3.13/3.14

- add lint.yml
- build-tests.yml: add Python 3.13/3.14, drop secrets: inherit
- release_workflow.yml: explicit PYPI_TOKEN/MATRIX_TOKEN, add permissions
- publish_stable.yml: push trigger, explicit secrets, publish_release/sync_dev
- coverage.yml: add test_path/install_extras/min_coverage, drop secrets: inherit
- license_check.yml, pip_audit.yml: drop secrets: inherit

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Update README.md

* Delete .github/workflows/python-support.yml

* fix: address PR #8 CodeRabbit feedback and align CI workflows with nebulento

- palavreado/__init__.py: read IntentCreator.name directly in remove_intent
  instead of calling .build() (avoids wasteful allocation)
- palavreado/builder.py: add inline Note to all four regex slot methods
  explaining the intentional empty-bucket design for partial_conf weighting
- palavreado/bracket_expansion.py: update expand_parentheses docstring to
  reflect actual str->List[str] signature (was stale list<str>->list<list<str>>)
- pyproject.toml: switch to SPDX license string, add license-files entry,
  and add explicit Python 3.9-3.13 classifiers to match requires-python
- README.md: add Breaking changes section documenting RuntimeError on
  duplicate add_intent and the remove_intent-first pattern
- test/test_palavreado.py: add test_remove_intent_via_creator to lock the
  IntentCreator overload contract
- .github/workflows: add missing opm-check.yml; remove spurious
  `secrets: inherit` from release-preview and repo-health (matches nebulento
  pattern); align changelog_max_issues to 50 in release_workflow

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: remove deprecated license classifier and clean up builder docstrings

Drop the old-style `License :: OSI Approved :: Apache Software License`
classifier from pyproject.toml — newer setuptools (PEP 639) rejects it
when `license` and `license-files` fields are already present, causing
all CI jobs (build, coverage, opm_check, license_check, pip_audit) to
fail at the build step.

Remove the "Note:" sections from the four regex slot methods in
builder.py (require_regex, optional_regex, require_autoregex,
optional_autoregex) that described internal "empty-bucket design"
details; the docstrings now only describe what each method does and its
args/returns.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: count misclassifications as both FN and FP in accuracy benchmark

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: correct five verified bugs from code review

- __init__.py: lemma_map keys now per-token lemmatized (fixes phrase misses in Pass 2)
- __init__.py: required slots check uses full .keys() not 'if s' guard
- opm.py: remove lru_cache from _match_intent and _calc_palavreado_intent (stale on mutable state)
- opm.py: _regexes keyed by lang+entity_type; wired into require_regex/optional_regex at intent registration; pruned in handle_detach_skill
- compare.py: count misclassifications as FP for predicted intent (same fix as accuracy.py)
- dataset.py: fix mislabeled cases — 'pause' expanded to 'pause the music'; 'put a timer on for lunch' and 'turn off the lights and set a timer' relabeled to set_timer

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs: update benchmark results after bug fixes

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix+test: regex named-group slots, duplicate import, 16 new tests

Fixes:
- bracket_expansion.py: remove duplicate 'import re'
- __init__.py: regex slots with named groups now mark the slot name in
  matches so the required-check passes and conf credit fires; previously
  intents using require_regex with named-group patterns always returned None

New tests (16):
- TestRegexSlots: named groups fire + populate, slot name in keywords,
  missing regex = no match, combined regex + keyword slot
- TestOptionalOnlyIntent: optional-only intent never fires
- TestKeywordExclusionMultiword: blocks on phrase, passes without phrase,
  no partial-word false match
- TestTiebreaking: alphabetical tiebreaker, higher-confidence multi-slot wins
- TestScore: perfect score, remainder penalty, zero-word guard, clamping
  to [0,1], 4dp rounding

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* perf: eliminate redundant work in hot path

- Pre-compile excluded keyword regexes at exclude_keywords() call instead
  of compiling them fresh on every query in _filter
- _filter now iterates pre-compiled (kw_lower, rx|None) pairs with early
  break per intent — no closure allocation, no dynamic re.search pattern build
- Tokenize and lemmatize the query once in calc_intents; reuse the list for
  both the set (query_lemmas) and the string (lemma_query)
- Cache per-candidate lemma strings inside _match during the initial
  classification pass; Pass 2 lemma_map reuses that cache instead of
  re-lemmatizing every token a second time
- Pre-sort regex patterns by length at add_intent() time; matching loop
  iterates self._sorted_regex[name][slot] directly with no per-query sort
- Pre-compute matched-word count (_mw) and remainder word count (_rw) in
  the yield inside calc_intents; calc_intent reads them directly instead of
  recomputing via _matched_words() for each comparison

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant