Initial commit: Lush vs Bash AI benchmarking framework

Benchmark harness that uses LLM agents to solve shell scripting tasks in both Bash and Lush, then compares correctness and code quality. - CLI with run, run-all, list-tasks, report, and export commands - Agent loop with retry support via Anthropic Claude provider - Test harness executing solutions in sandboxed subprocesses - LLM-driven questionnaire for subjective code quality evaluation - HTML report export with charts (matplotlib) - 8 Category A tasks (write-from-scratch in both languages) - 4 Category B tasks (verify provided Bash, convert to Lush) - Lush language reference for agent context
2026-03-29 17:56:30 +01:00
commit be8d657b24
33 changed files with 3302 additions and 0 deletions
--- a/tasks/category_a/env_config.toml
+++ b/tasks/category_a/env_config.toml
@@ -0,0 +1,32 @@
+name = "env_config"
+category = "a"
+description = """
+Read a config format from stdin where each line is "KEY=VALUE".
+For each line, set an environment variable with that key and value.
+After processing all lines, run the command `env` and print only the variables
+that were set from the input, sorted alphabetically by key, in "KEY=VALUE" format.
+
+You must actually set these as environment variables and retrieve them back
+(not just echo the input).
+"""
+
+[[test_cases]]
+stdin = """APP_NAME=myapp
+APP_PORT=8080
+APP_DEBUG=true"""
+expected_stdout = """APP_DEBUG=true
+APP_NAME=myapp
+APP_PORT=8080"""
+env = {}
+
+[[test_cases]]
+stdin = """DB_HOST=localhost
+DB_PORT=5432"""
+expected_stdout = """DB_HOST=localhost
+DB_PORT=5432"""
+env = {}
+
+[[test_cases]]
+stdin = "SINGLE_VAR=hello"
+expected_stdout = "SINGLE_VAR=hello"
+env = {}