Expand issue #23 with detailed cleanup analysis; close issues #5 and #24

Issue #23 now covers function naming conventions, file organisation,
global namespace cleanup options, and a concrete implementation plan.
This commit is contained in:
Cormac Shannon
2026-03-12 19:26:05 +00:00
parent c556928d0a
commit 34bfabccbd
3 changed files with 158 additions and 11 deletions

View File

@@ -1,6 +1,6 @@
# 05 — Implement environment variable access (read/write)
**Status:** open
**Status:** done
**Blocked by:** #01
Add syntax for reading and writing environment variables directly from Lua.

View File

@@ -4,17 +4,164 @@
Housekeeping pass over the codebase.
## Function names
---
Function names should be reviewed and improved for consistency and clarity.
## 1. Function naming conventions
## Global namespace (`_G`)
### Current state
Review what's exposed in `_G`. Do we need to expose everything? Candidates for cleanup:
Lua's convention: every non-static function in a file uses a common prefix (e.g. `luaX_` for llex.c, `luaK_` for lcode.c, `luaB_` for lbaselib.c).
- `__command` — internal, could be hidden
- `__interactive` — internal, could be hidden
- `__getenv` / `__setenv` — internal, could be hidden
- `__builtins` — should this be user-facing or internal?
Our custom files don't follow a consistent prefix scheme:
| File | Current names | Issue |
| ------------ | ----------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------- |
| `lcmd.c` | `luaB_command`, `luaB_interactive` | Uses `luaB_` prefix, which belongs to `lbaselib.c` |
| `linit.c` | `luaB_getenv`, `luaB_setenv` | Same — `luaB_` is for the base library |
| `lbuiltin.c` | `builtin_cd`, `builtin_exec`, `builtin_umask`, `luaopen_builtins` | `builtin_` prefix is fine for statics; `luaopen_builtins` follows the `luaopen_` convention correctly |
### Proposed fix
Introduce a `luaCmd_` prefix for lcmd.c exports and `luaEnv_` (or keep in lcmd) for env functions:
- `luaB_command``luaCmd_command` (lcmd.c)
- `luaB_interactive``luaCmd_interactive` (lcmd.c)
- `luaB_getenv``luaCmd_getenv` (linit.c, or move to lcmd.c)
- `luaB_setenv``luaCmd_setenv` (linit.c, or move to lcmd.c)
Update `lcmd.h` exports and all call sites (`linit.c`).
Static helper functions in lcmd.c (`expand_tilde`, `parse_argv`, `exec_pipeline`, etc.) don't need prefixes — they're already file-scoped. This is consistent with Lua's approach.
### Files to change
- `lcmd.c` — rename `luaB_command`, `luaB_interactive`
- `lcmd.h` — update declarations
- `linit.c` — rename `luaB_getenv`, `luaB_setenv`; update `opencommand()`
---
## 2. File organisation
### Current state
The custom code lives in two files:
- `**lcmd.c`** (~1100 lines) — command execution, argv parsing, glob/tilde/brace expansion, pipeline execution, signal handling, dynamic buffer management, builtin dispatch
- `**lbuiltin.c`** (~160 lines) — cd, exec, umask implementations
Plus env var functions live in `**linit.c**` rather than in a dedicated file.
### Questions to resolve
- **Should `luaB_getenv`/`luaB_setenv` move out of linit.c?** They're the only non-init logic in that file. Could move to `lcmd.c` (they're command-adjacent) or a new `lenv.c`.
- **Is lcmd.c too large?** At ~1100 lines it bundles argv parsing, shell expansion, pipeline execution, and signal handling. Consider whether expansion logic (`expand_tilde`, `expand_braces`, `glob_token`, `expand_argv`) should be split into a separate `lexpand.c`.
- `**try_builtin` in lcmd.c** looks up `__builtins` from `_G` at runtime. This couples lcmd.c to lbuiltin.c's global registration. Could use registry instead.
### Recommendation
Keep two files for now. The code is manageable. Move env functions into lcmd.c (or a small lenv.c) to keep linit.c clean.
---
## 3. Global namespace (`_G`) cleanup
### Current state
Five entries are added to `_G`:
| Global | Set in | Used by | User-callable? |
| --------------- | ---------- | ----------------------------- | ------------------------------------------------------------- |
| `__command` | linit.c | parser (backtick syntax) | No — compiler-generated calls only |
| `__interactive` | linit.c | parser (`!cmd` syntax) | No — compiler-generated calls only |
| `__getenv` | linit.c | parser (`$VAR` syntax) | No — compiler-generated calls only |
| `__setenv` | linit.c | parser (`$VAR = expr` syntax) | No — compiler-generated calls only |
| `__builtins` | lbuiltin.c | lcmd.c `try_builtin()` | Possibly — user might want to call `__builtins.cd()` directly |
All four `__`-prefixed functions are **internal implementation details** — the parser compiles syntactic sugar into calls to these. Users should never need to call them directly. Exposing them in `_G` means:
1. Users can accidentally overwrite them, breaking shell syntax
2. They pollute `_G` with implementation details
3. They show up in `pairs(_G)` / tab completion
### Option A: Move to Lua registry (recommended)
Store internal functions in `LUA_REGISTRYINDEX` instead of `_G`. The parser already looks up globals via `_ENV` — we'd change it to emit registry lookups instead.
**Pros:** Clean `_G`, functions can't be accidentally overwritten
**Cons:** Requires parser changes (how `commandexp()`, `codeenvget()`, `codeenvset()`, `interactivestat()` emit lookups)
Implementation sketch:
- In `linit.c`: store functions in registry with known string keys
- In `lparser.c`: change `singlevar()` calls to registry lookups via `luaK_codek` + `OP_GETFIELD` on the registry
### Option B: Group under `__lush` table
Put all internal functions in a single `__lush` table in `_G`:
```lua
__lush = {
command = function(...) end,
interactive = function(...) end,
getenv = function(...) end,
setenv = function(...) end,
builtins = { cd = ..., exec = ..., umask = ... },
}
```
**Pros:** Reduces `_G` pollution to 1 entry; still inspectable for debugging
**Cons:** Still in `_G`; still overwritable
Implementation:
- In `linit.c`: create table, register functions as fields, `lua_setglobal(L, "__lush")`
- In `lparser.c`: change generated code from `__command(...)` to `__lush.command(...)`
- In `lcmd.c`: change `try_builtin` to look up `__lush.builtins` instead of `__builtins`
### Option C: Keep as-is but protect
Use `__newindex` metamethod on `_G` to prevent overwriting `__`-prefixed keys, or mark them as read-only.
**Pros:** Minimal code change
**Cons:** Still pollutes `_G`; metamethod approach is fragile
### Recommendation
**Option B** is the best balance — simple to implement, reduces global pollution from 5 entries to 1, keeps things debuggable. Option A is cleaner but requires more invasive parser changes.
---
## 4. `__builtins` — user-facing or internal?
### Current state
`__builtins` is a table with `cd`, `exec`, `umask` functions. It's looked up by `try_builtin()` in lcmd.c at runtime via `lua_getglobal(L, "__builtins")`.
### Questions
- Should users be able to add their own builtins? (e.g. `__builtins.pushd = function(cmd, dir) ... end`)
- Should users be able to override builtins? (e.g. `__builtins.cd = my_custom_cd`)
If yes to either, `__builtins` should remain accessible (but possibly under `__lush.builtins`).
If no, it should be in the registry.
### Recommendation
Keep `__builtins` user-accessible (under `__lush.builtins` if we go with Option B above). Allowing users to extend/override builtins is a useful shell feature.
---
## Implementation plan
1. **Rename functions** — change `luaB_` prefix to `luaCmd_` in lcmd.c/h and linit.c
2. **Move env functions** — relocate `luaB_getenv`/`luaB_setenv` from linit.c into lcmd.c
3. **Consolidate globals** — implement Option B (group under `__lush` table)
4. **Update parser** — change generated code to reference `__lush.command` etc.
5. **Update lcmd.c** — change `try_builtin` to use `__lush.builtins`
6. **Test** — run existing test suite, verify backtick/env/interactive/builtin functionality
Consider whether internal functions should live in a single `__lush` table or use the registry instead of polluting globals.

View File

@@ -1,6 +1,6 @@
# 24 — Test against major Lua projects
**Status:** open
**Status:** done
## Problem