Initial commit: Lush vs Bash AI benchmarking framework

Benchmark harness that uses LLM agents to solve shell scripting tasks
in both Bash and Lush, then compares correctness and code quality.

- CLI with run, run-all, list-tasks, report, and export commands
- Agent loop with retry support via Anthropic Claude provider
- Test harness executing solutions in sandboxed subprocesses
- LLM-driven questionnaire for subjective code quality evaluation
- HTML report export with charts (matplotlib)
- 8 Category A tasks (write-from-scratch in both languages)
- 4 Category B tasks (verify provided Bash, convert to Lush)
- Lush language reference for agent context

This commit is contained in:

Cormac Shannon

2026-03-29 17:56:30 +01:00

commit be8d657b24

33 changed files with 3302 additions and 0 deletions

16

.gitignore vendored Normal file

View File

@@ -0,0 +1,16 @@
 #
 .env
 # Python-generated files
 __pycache__/
 *.py[oc]
 build/
 dist/
 wheels/
 *.egg-info
 # Virtual environments
 .venv
 # Results output
 results/

Initial commit: Lush vs Bash AI benchmarking framework

16 .gitignore vendored Normal file Unescape Escape View File

16

.gitignore vendored Normal file

View File