Architecture

End-to-end map of the KynML compiler pipeline, package structure, and runtime surface.

Pipeline Overview

KynML compiles a .kyn source file into a complete, executable Python + PyTorch training script. No PyTorch is required at compile time — the generated script is emitted as plain text.

┌─────────────┐   parse_text / parse_file
│  .kyn source│──────────────────────────► Parser (_Parser)
└─────────────┘                                │
                                               ▼
                                        Typed AST (Program)
                                        ast_nodes.py
                                               │
                                    apply_composition()      ← import resolution
                                       compose.py             + param substitution
                                               │
                                      validate_program()
                                        semantic.py
                                               │
                                       lower_program()       ← structural lowering
                                       ir/builder.py          AST → IRModule
                                               │
                                        run_passes()         ← shape/type inference
                                        ir/passes.py          fills in_features,
                                        ir/infer.py           num_features, checks
                                               │              loss/output agreement
                                          IRModule
                                        (inferred=True)
                                               │
                                      Backend.emit()         ← code generation
                                   codegen/pytorch_backend.py
                                               │
                                               ▼
                                     Generated .py script
                                     (self-contained, no kynml import)
                                               │
                                     subprocess / python -m
                                               │
                                               ▼
                                     Training run → run_manifest.json
                                                  → model file

The canonical front door for backends and tooling is compile_to_ir() in kynml/pipeline.py, which runs the entire pipeline (compose → validate → lower → infer) and returns an IRModule ready for a Backend.

In Mermaid form:

flowchart LR
    Source[".kyn source"] --> Parser["kynml.parser"]
    Parser --> AST["Typed AST\nast_nodes.Program"]
    AST --> Compose["kynml.compose\napply_composition()"]
    Compose --> Semantic["kynml.semantic\nvalidate_program()"]
    Semantic --> Lower["kynml.ir.builder\nlower_program()"]
    Lower --> Infer["kynml.ir.passes\nrun_passes()"]
    Infer --> IR["IRModule\n(inferred=True)"]
    IR --> Backend["kynml.codegen.Backend\n.emit(module, ctx)"]
    Backend --> Script["Generated .py"]
    Script --> Run["Python interpreter\n(training run)"]

Package Layout

kynml/
├── ast_nodes.py          Frozen dataclass AST nodes (including ParamRef, ParamsBlock, SweepBlock)
├── parser.py             Handwritten indentation-aware parser
├── semantic.py           Constraint validation (names, ranges, enums)
├── pipeline.py           compile_to_ir() — canonical front door for all backends
├── compose.py            Composition passes: import resolution + param substitution
├── sweep.py              Sweep grid expansion (expand_sweep, generate_sweep_runner)
├── competitors.py        Competitor positioning data + renderers for CLI/docs
├── lock.py               kynml.lock — config-hash pin for drift detection
├── errors.py             KynMLError, KynMLParseError, KynMLSemanticError,
│                         KynMLCodegenError, KynMLShapeError
├── cli.py                Typer CLI (validate / ast / compile / train / sweep / compare / fmt / lsp)
├── ir/
│   ├── types.py          TypeShape, DType — tensor type primitives
│   ├── nodes.py          IRModule, IRGraph, IRTrain, IRDataset, IROp subclasses
│   ├── builder.py        lower_program() — AST → un-inferred IRModule
│   ├── infer.py          infer_module() — shape/type inference pass
│   └── passes.py         run_passes() — orchestrates all IR passes
├── codegen/
│   ├── base.py           Backend ABC + EmitContext + get_backend()
│   ├── pytorch_backend.py PyTorchBackend — reads exclusively from IRModule
│   └── pytorch.py        generate_pytorch / write_pytorch (compatibility shim)
├── repro/
│   └── manifest.py       RunManifest, config_hash(), data_hash(), write_manifest()
├── format/
│   └── formatter.py      format_source() — idempotent canonical .kyn formatter
├── lsp/
│   └── diagnostics.py    diagnose(source) → list[Diagnostic] — no pygls required
├── serving/
│   ├── __init__.py       Re-exports generate_service
│   └── generator.py      FastAPI service + Dockerfile emitter
├── mcp/
│   ├── __main__.py       Entry point (python -m kynml.mcp.server)
│   └── server.py         MCP stdio server: 4 tools
└── integrations/
    ├── huggingface.py    load_hf_dataset() — optional dep: datasets
    └── objectstore.py    load_remote()     — optional dep: s3fs + pyarrow

Compile-Time vs Runtime

The compiler pipeline is pure Python. kynml validate, kynml ast, kynml compile, kynml fmt, and kynml lsp have zero PyTorch dependency — they manipulate text and dataclasses only. This is intentional: it keeps the Cloudflare Worker playground feasible and lets CI validate specs on lightweight runners.

PyTorch is only required when the generated script actually runs (via kynml train, python generated/foo.py, or kynml_train in the MCP server).

Dependency Boundaries

CORE (always installed)           CONNECT (optional extras)
────────────────────────          ──────────────────────────
numpy, pandas, scikit-learn,      serving:     fastapi, uvicorn
torch, typer                      mcp:         mcp>=1.0
                                  lsp:         pygls
                                  hf:          datasets>=2.0
                                  objectstore: s3fs, pyarrow

Optional extras are imported lazily inside functions, never at module level. import kynml without extras must not pull in FastAPI, the MCP SDK, or pygls.

CLI Architecture

The CLI is implemented with Typer and exposed as the kynml entry point (declared in [project.scripts] in pyproject.toml).

Command	Action
`kynml validate <file>`	Parse + semantic validation, exit 0/1
`kynml ast <file>`	Parse and pretty-print the AST dict
`kynml compile <file> --out <py> [--param k=v ...]`	Validate and write the generated script
`kynml sweep <file> [--out-dir dir] [--no-run]`	Expand sweep block, compile all combos, run sweep orchestrator
`kynml train <file> [--param k=v ...]`	Compile to `generated/<stem>.py` then `subprocess.run` it
`kynml compare [--focus name] [--format table\|markdown\|json]`	Render the checked-in competitive positioning map
`kynml fmt <file> [--write] [--check]`	Format `.kyn` source in canonical form
`kynml lsp`	Start LSP server over stdio (requires `pygls`)

Running via the module:

python -m kynml.cli validate examples/house_price.kyn

Generated Script Design

The generated script is intentionally self-contained: it imports only standard library, torch, numpy, pandas, and scikit-learn. No import kynml appears in the output. This lets users read, version, and tweak the training code without keeping the KynML toolchain installed.

Constants (dataset path, hyperparameters, device, export settings, CONFIG_HASH, SEED) are emitted as module-level variables so they are immediately visible and patchable.

Every generated script writes run_manifest.json after training, capturing config_hash, data_hash, env versions, seed, and metrics.

IR Boundary

kynml/ir/ defines the stable seam between the frontend (parse/validate) and the backend (codegen). Backends must consume only IRModule — they must never read the AST directly.

IRModule.inferred is set to True by run_passes(). PyTorchBackend.emit() asserts this flag at entry so an un-inferred module causes an early, clear error rather than a silent wrong result.

Backend Seam

kynml/codegen/base.py defines the Backend ABC:

class Backend:
    name: str = "base"

    def emit(self, module: IRModule, ctx: EmitContext) -> str:
        raise NotImplementedError(...)

EmitContext carries source_path and project_dir so path resolution is deterministic and injectable for tests. get_backend("pytorch") returns PyTorchBackend(). To register a new backend, implement Backend.emit() and add a branch in get_backend().

Website

The static site lives in site/ and is deployed to Cloudflare Pages at kynml.kynetra.dev. It is deliberately separate from the Python package so installing KynML does not incur frontend dependencies.

// wrangler.jsonc
{
  "name": "kynml",
  "pages_build_output_dir": "site",
  "compatibility_date": "2026-06-23"
}

Deploy:

npx wrangler pages deploy site --project-name=kynml --branch=main