lush_grading/config.toml at 18ce7e57cf3d448aa9fb1b0d5fef5d3ec9e2266b - lush_grading - Gitea: Hosted for and by Nik

nik/lush_grading

Files

Cormac Shannon be8d657b24 Initial commit: Lush vs Bash AI benchmarking framework

Benchmark harness that uses LLM agents to solve shell scripting tasks
in both Bash and Lush, then compares correctness and code quality.

- CLI with run, run-all, list-tasks, report, and export commands
- Agent loop with retry support via Anthropic Claude provider
- Test harness executing solutions in sandboxed subprocesses
- LLM-driven questionnaire for subjective code quality evaluation
- HTML report export with charts (matplotlib)
- 8 Category A tasks (write-from-scratch in both languages)
- 4 Category B tasks (verify provided Bash, convert to Lush)
- Lush language reference for agent context

2026-03-29 17:56:30 +01:00

16 lines

261 B

TOML

Raw Blame History

 [lush]
 binary = "/Users/nik/Code/20251000_lush/lush"
 [agent]
 max_retries = 3
 timeout_seconds = 10
 normalize_whitespace = true
 [results]
 output_dir = "results"
 [anthropic]
 api_key_env = "ANTHROPIC_API_KEY"
 model = "claude-sonnet-4-20250514"
 max_tokens = 4096