Skip to content

Latest commit

 

History

History
59 lines (50 loc) · 5.13 KB

File metadata and controls

59 lines (50 loc) · 5.13 KB

v0.4.0 Contract Skill Benchmark

Generated: 2026-03-08T22:56:33Z Case pack: evals/cases/contract_skill_benchmark.jsonl

Overall

  • Version: v0.4.0
  • Cases: 26
  • Precision: 1.000
  • Recall: 1.000
  • Accuracy: 1.000
  • Interpretation: reportable benchmark sample

Outcome Summary

  • TP: 13
  • TN: 13
  • FP: 0
  • FN: 0
  • Skipped: 0

Case Results

Case Skill Expected Predicted Outcome Build Tests Static Notes
so_guard_owner_nonzero_secure_owned cairo-contract-authoring True True tp True True True
so_guard_owner_auth_secure_owned cairo-contract-authoring True True tp True True True
so_guard_split_owner_auth_secure_owned cairo-contract-authoring True True tp True True True
so_guard_fee_bound_secure_owned cairo-contract-authoring True True tp True True True
so_opt_divrem_split_secure_owned cairo-optimization True True tp True True True
io_guard_owner_nonzero_insecure_owned cairo-contract-authoring False False tn True True False must_match_failed:src/lib.cairo:constructor should enforce non-zero owner
io_guard_owner_auth_insecure_owned cairo-contract-authoring False False tn True True False must_match_failed:src/lib.cairo:fee update should enforce owner authorization
io_guard_split_owner_auth_insecure_owned cairo-contract-authoring False False tn True True False must_match_failed:src/lib.cairo:split path should enforce owner authorization
io_guard_fee_bound_insecure_owned cairo-contract-authoring False False tn True True False must_match_failed:src/lib.cairo:fee updates should bound-check bps
io_opt_divrem_split_insecure_owned cairo-optimization False False tn True True False must_match_failed:src/lib.cairo:should use DivRem for quotient/remainder
must_not_match_failed:src/lib.cairo:should avoid standalone division
must_not_match_failed:src/lib.cairo:should avoid standalone modulus
su_guard_owner_nonzero_secure_upgrade cairo-contract-authoring True True tp True True True
su_guard_owner_auth_secure_upgrade cairo-contract-authoring True True tp True True True
su_guard_timelock_secure_upgrade cairo-contract-authoring True True tp True True True
su_guard_hash_nonzero_secure_upgrade cairo-contract-authoring True True tp True True True
iu_guard_owner_nonzero_insecure_upgrade cairo-contract-authoring False False tn True True False must_match_failed:src/lib.cairo:constructor should enforce non-zero owner
iu_guard_owner_auth_insecure_upgrade cairo-contract-authoring False False tn True True False must_match_failed:src/lib.cairo:upgrade should enforce owner authorization
iu_guard_timelock_insecure_upgrade cairo-contract-authoring False False tn True True False must_match_failed:src/lib.cairo:upgrade should enforce execution delay
iu_guard_hash_nonzero_insecure_upgrade cairo-contract-authoring False False tn True True False must_match_failed:src/lib.cairo:upgrade_now should enforce non-zero class hash
sm_opt_divrem_secure_math cairo-optimization True True tp True True True
sm_opt_no_div_mod_secure_math cairo-optimization True True tp True True True
sm_opt_no_bitwise_parity_secure_math cairo-optimization True True tp True True True
sm_opt_loop_eq_secure_math cairo-optimization True True tp True True True
im_opt_divrem_insecure_math cairo-optimization False False tn True True False must_match_failed:src/lib.cairo:should use DivRem for split/parity
im_opt_no_div_mod_insecure_math cairo-optimization False False tn True True False must_not_match_failed:src/lib.cairo:should avoid standalone division
must_not_match_failed:src/lib.cairo:should avoid standalone modulus
im_opt_no_bitwise_parity_insecure_math cairo-optimization False False tn True True False must_not_match_failed:src/lib.cairo:should avoid bitwise parity check
im_opt_loop_eq_insecure_math cairo-optimization False False tn True True False must_match_failed:src/lib.cairo:should use equality loop condition
must_not_match_failed:src/lib.cairo:should avoid less-than loop condition

Notes

  • Tools: scarb=yes, snforge=yes.
  • Positive cases must compile/test and satisfy all static policy assertions.
  • Negative cases validate that policy checks fail on intentionally insecure patterns.
  • Sample policy: fewer than 22 evaluated cases is smoke-only and should not be reported as broad skill quality.