Initial commit: Lush vs Bash AI benchmarking framework
Benchmark harness that uses LLM agents to solve shell scripting tasks in both Bash and Lush, then compares correctness and code quality. - CLI with run, run-all, list-tasks, report, and export commands - Agent loop with retry support via Anthropic Claude provider - Test harness executing solutions in sandboxed subprocesses - LLM-driven questionnaire for subjective code quality evaluation - HTML report export with charts (matplotlib) - 8 Category A tasks (write-from-scratch in both languages) - 4 Category B tasks (verify provided Bash, convert to Lush) - Lush language reference for agent context
This commit is contained in:
38
tasks/category_a/fizzbuzz.toml
Normal file
38
tasks/category_a/fizzbuzz.toml
Normal file
@@ -0,0 +1,38 @@
|
||||
name = "fizzbuzz"
|
||||
category = "a"
|
||||
description = """
|
||||
Read a single integer N from stdin. Print numbers from 1 to N, one per line.
|
||||
For multiples of 3, print "Fizz" instead of the number.
|
||||
For multiples of 5, print "Buzz" instead of the number.
|
||||
For multiples of both 3 and 5, print "FizzBuzz" instead of the number.
|
||||
"""
|
||||
|
||||
[[test_cases]]
|
||||
stdin = "15"
|
||||
expected_stdout = """1
|
||||
2
|
||||
Fizz
|
||||
4
|
||||
Buzz
|
||||
Fizz
|
||||
7
|
||||
8
|
||||
Fizz
|
||||
Buzz
|
||||
11
|
||||
Fizz
|
||||
13
|
||||
14
|
||||
FizzBuzz"""
|
||||
|
||||
[[test_cases]]
|
||||
stdin = "5"
|
||||
expected_stdout = """1
|
||||
2
|
||||
Fizz
|
||||
4
|
||||
Buzz"""
|
||||
|
||||
[[test_cases]]
|
||||
stdin = "1"
|
||||
expected_stdout = "1"
|
||||
Reference in New Issue
Block a user