Architecture
End-to-end map of the KynML compiler pipeline, package structure, and runtime surface.
Pipeline Overview
KynML compiles a .kyn source file into a complete, executable Python + PyTorch training script. No PyTorch is required at compile time — the generated script is emitted as plain text.
┌─────────────┐ parse_text / parse_file
│ .kyn source│──────────────────────────► Parser (_Parser)
└─────────────┘ │
▼
Typed AST (Program)
ast_nodes.py
│
apply_composition() ← import resolution
compose.py + param substitution
│
validate_program()
semantic.py
│
lower_program() ← structural lowering
ir/builder.py AST → IRModule
│
run_passes() ← shape/type inference
ir/passes.py fills in_features,
ir/infer.py num_features, checks
│ loss/output agreement
IRModule
(inferred=True)
│
Backend.emit() ← code generation
codegen/pytorch_backend.py
│
▼
Generated .py script
(self-contained, no kynml import)
│
subprocess / python -m
│
▼
Training run → run_manifest.json
→ model file
The canonical front door for backends and tooling is compile_to_ir() in kynml/pipeline.py, which runs the entire pipeline (compose → validate → lower → infer) and returns an IRModule ready for a Backend.
In Mermaid form:
flowchart LR
Source[".kyn source"] --> Parser["kynml.parser"]
Parser --> AST["Typed AST\nast_nodes.Program"]
AST --> Compose["kynml.compose\napply_composition()"]
Compose --> Semantic["kynml.semantic\nvalidate_program()"]
Semantic --> Lower["kynml.ir.builder\nlower_program()"]
Lower --> Infer["kynml.ir.passes\nrun_passes()"]
Infer --> IR["IRModule\n(inferred=True)"]
IR --> Backend["kynml.codegen.Backend\n.emit(module, ctx)"]
Backend --> Script["Generated .py"]
Script --> Run["Python interpreter\n(training run)"]
Package Layout
kynml/
├── ast_nodes.py Frozen dataclass AST nodes (including ParamRef, ParamsBlock, SweepBlock)
├── parser.py Handwritten indentation-aware parser
├── semantic.py Constraint validation (names, ranges, enums)
├── pipeline.py compile_to_ir() — canonical front door for all backends
├── compose.py Composition passes: import resolution + param substitution
├── sweep.py Sweep grid expansion (expand_sweep, generate_sweep_runner)
├── competitors.py Competitor positioning data + renderers for CLI/docs
├── lock.py kynml.lock — config-hash pin for drift detection
├── errors.py KynMLError, KynMLParseError, KynMLSemanticError,
│ KynMLCodegenError, KynMLShapeError
├── cli.py Typer CLI (validate / ast / compile / train / sweep / compare / fmt / lsp)
├── ir/
│ ├── types.py TypeShape, DType — tensor type primitives
│ ├── nodes.py IRModule, IRGraph, IRTrain, IRDataset, IROp subclasses
│ ├── builder.py lower_program() — AST → un-inferred IRModule
│ ├── infer.py infer_module() — shape/type inference pass
│ └── passes.py run_passes() — orchestrates all IR passes
├── codegen/
│ ├── base.py Backend ABC + EmitContext + get_backend()
│ ├── pytorch_backend.py PyTorchBackend — reads exclusively from IRModule
│ └── pytorch.py generate_pytorch / write_pytorch (compatibility shim)
├── repro/
│ └── manifest.py RunManifest, config_hash(), data_hash(), write_manifest()
├── format/
│ └── formatter.py format_source() — idempotent canonical .kyn formatter
├── lsp/
│ └── diagnostics.py diagnose(source) → list[Diagnostic] — no pygls required
├── serving/
│ ├── __init__.py Re-exports generate_service
│ └── generator.py FastAPI service + Dockerfile emitter
├── mcp/
│ ├── __main__.py Entry point (python -m kynml.mcp.server)
│ └── server.py MCP stdio server: 4 tools
└── integrations/
├── huggingface.py load_hf_dataset() — optional dep: datasets
└── objectstore.py load_remote() — optional dep: s3fs + pyarrow
Compile-Time vs Runtime
The compiler pipeline is pure Python. kynml validate, kynml ast, kynml compile, kynml fmt, and kynml lsp have zero PyTorch dependency — they manipulate text and dataclasses only. This is intentional: it keeps the Cloudflare Worker playground feasible and lets CI validate specs on lightweight runners.
PyTorch is only required when the generated script actually runs (via kynml train, python generated/foo.py, or kynml_train in the MCP server).
Dependency Boundaries
CORE (always installed) CONNECT (optional extras)
──────────────────────── ──────────────────────────
numpy, pandas, scikit-learn, serving: fastapi, uvicorn
torch, typer mcp: mcp>=1.0
lsp: pygls
hf: datasets>=2.0
objectstore: s3fs, pyarrow
Optional extras are imported lazily inside functions, never at module level. import kynml without extras must not pull in FastAPI, the MCP SDK, or pygls.
CLI Architecture
The CLI is implemented with Typer and exposed as the kynml entry point (declared in [project.scripts] in pyproject.toml).
| Command | Action |
|---|---|
kynml validate <file> |
Parse + semantic validation, exit 0/1 |
kynml ast <file> |
Parse and pretty-print the AST dict |
kynml compile <file> --out <py> [--param k=v ...] |
Validate and write the generated script |
kynml sweep <file> [--out-dir dir] [--no-run] |
Expand sweep block, compile all combos, run sweep orchestrator |
kynml train <file> [--param k=v ...] |
Compile to generated/<stem>.py then subprocess.run it |
kynml compare [--focus name] [--format table|markdown|json] |
Render the checked-in competitive positioning map |
kynml fmt <file> [--write] [--check] |
Format .kyn source in canonical form |
kynml lsp |
Start LSP server over stdio (requires pygls) |
Running via the module:
python -m kynml.cli validate examples/house_price.kyn
Generated Script Design
The generated script is intentionally self-contained: it imports only standard library, torch, numpy, pandas, and scikit-learn. No import kynml appears in the output. This lets users read, version, and tweak the training code without keeping the KynML toolchain installed.
Constants (dataset path, hyperparameters, device, export settings, CONFIG_HASH, SEED) are emitted as module-level variables so they are immediately visible and patchable.
Every generated script writes run_manifest.json after training, capturing config_hash, data_hash, env versions, seed, and metrics.
IR Boundary
kynml/ir/ defines the stable seam between the frontend (parse/validate) and the backend (codegen). Backends must consume only IRModule — they must never read the AST directly.
IRModule.inferred is set to True by run_passes(). PyTorchBackend.emit() asserts this flag at entry so an un-inferred module causes an early, clear error rather than a silent wrong result.
Backend Seam
kynml/codegen/base.py defines the Backend ABC:
class Backend:
name: str = "base"
def emit(self, module: IRModule, ctx: EmitContext) -> str:
raise NotImplementedError(...)
EmitContext carries source_path and project_dir so path resolution is deterministic and injectable for tests. get_backend("pytorch") returns PyTorchBackend(). To register a new backend, implement Backend.emit() and add a branch in get_backend().
Website
The static site lives in site/ and is deployed to Cloudflare Pages at kynml.kynetra.dev. It is deliberately separate from the Python package so installing KynML does not incur frontend dependencies.
// wrangler.jsonc
{
"name": "kynml",
"pages_build_output_dir": "site",
"compatibility_date": "2026-06-23"
}
Deploy:
npx wrangler pages deploy site --project-name=kynml --branch=main
See Also
- Compiler Internals — per-module responsibilities and extension points
- Shape Inference — how in_features are inferred and errors surfaced
- Composition — params, $references, import, and sweep
- Reproducibility — config_hash, seed, run_manifest.json
- Tooling — formatter and LSP diagnostics
- MCP Integration — agent-accessible compiler tools
- Serving — FastAPI inference service emitter