lush_grading/lush_bench/providers/base.py at be8d657b24a53e4e16bef76114d47dbc31e27678 - lush_grading - Gitea: Hosted for and by Nik

nik/lush_grading

Files

Cormac Shannon be8d657b24 Initial commit: Lush vs Bash AI benchmarking framework

Benchmark harness that uses LLM agents to solve shell scripting tasks
in both Bash and Lush, then compares correctness and code quality.

- CLI with run, run-all, list-tasks, report, and export commands
- Agent loop with retry support via Anthropic Claude provider
- Test harness executing solutions in sandboxed subprocesses
- LLM-driven questionnaire for subjective code quality evaluation
- HTML report export with charts (matplotlib)
- 8 Category A tasks (write-from-scratch in both languages)
- 4 Category B tasks (verify provided Bash, convert to Lush)
- Lush language reference for agent context

2026-03-29 17:56:30 +01:00

18 lines

339 B

Python

Raw Blame History

 from __future__ import annotations
 from dataclasses import dataclass
 from typing import Protocol
 @dataclass
 class Message:
     role: str  # "user" or "assistant"
     content: str
 class LLMProvider(Protocol):
     def send(self, messages: list[Message], system: str = "") -> str: ...
     @property
     def model_name(self) -> str: ...