Docs Reproducibility

Reproducibility

Every training script generated by KynML writes a run_manifest.json alongside its output. The manifest captures everything needed to audit or reproduce a run: the spec fingerprint, dataset fingerprint, environment versions, RNG seed, and evaluation metrics.


CONFIG_HASH

The CONFIG_HASH is a SHA-256 hex digest (64 chars) of the canonical .kyn source bytes, computed at codegen time and embedded as a module-level constant in the generated script:

CONFIG_HASH = "a3f9d2..."  # injected at compile time

It changes whenever the .kyn source changes. Use it to confirm that a generated script matches the spec version that produced it, or to detect accidental spec drift between runs.

Hash computation:

from kynml.repro.manifest import config_hash

digest = config_hash(open("specs/model.kyn").read())  # str → utf-8 encoded internally

run_manifest.json

Written atomically (write-then-rename) to the script's working directory after every training run. The file is always created — there is no opt-in flag.

Schema

{
  "kynml_version": "0.1.0",
  "config_hash": "a3f9d2...",
  "data_hash":   "b7c1e4..." | null,
  "env": {
    "python":   "3.11.9",
    "torch":    "2.3.0",
    "numpy":    "1.26.4",
    "platform": "macOS-15.5-arm64-arm-64bit"
  },
  "seed":    42 | null,
  "metrics": {"mae": 0.123, "rmse": 0.187}
}

Field reference

Field Type Description
kynml_version string KynML version that compiled the script.
config_hash string SHA-256 of the .kyn source (hex, 64 chars). Matches CONFIG_HASH in the generated script.
data_hash string | null SHA-256 of the dataset file at runtime, streamed in 64 KiB chunks. null when the file is absent or unreadable (e.g. streaming sources).
env.python string sys.version first token (e.g. "3.11.9").
env.torch string torch.__version__; omitted if torch is not importable.
env.numpy string numpy.__version__; omitted if numpy is not importable.
env.platform string platform.platform().
seed int | null The RNG seed from the train: block, or null when not set.
metrics object dict[str, float] of evaluation metrics from the evaluate: block, populated after training.

Fields are serialised with sort_keys=True so diffs between runs are stable.

Reading a manifest in Python

from kynml.repro.manifest import load_manifest

m = load_manifest("run_manifest.json")
print(m.config_hash)    # "a3f9d2..."
print(m.seed)           # 42 or None
print(m.metrics)        # {"mae": 0.123, "rmse": 0.187}
print(m.env["torch"])   # "2.3.0"

Deterministic seeding

When the train: block specifies seed, the generated script calls _set_seed(seed) before training. This covers:

  • random.seed(seed)
  • numpy.random.seed(seed)
  • torch.manual_seed(seed)
  • torch.cuda.manual_seed_all(seed) (when CUDA is available)

Adding deterministic = true additionally sets:

torch.use_deterministic_algorithms(True)
torch.backends.cudnn.deterministic = True

.kyn syntax:

train:
    model = M
    data = D
    loss = mse
    optimizer = adam(lr=0.001)
    epochs = 20
    seed = 42
    deterministic = true

Without seed, the manifest records "seed": null and the run is not reproducible at the RNG level even if the config_hash matches.


kynml.lock

The lock file pins the config_hash of a .kyn spec so that spec drift is caught before a run. It is opt-in — the compile/train flows are unaffected unless you call the lock API explicitly.

Lock file format (kynml.lock)

{
  "backend": "pytorch",
  "config_hash": "a3f9d2...",
  "kynml_version": "0.1.0",
  "schema_version": "1",
  "source_path": "specs/model.kyn"
}

Python API

from kynml.lock import create_lock, check_lock, LockMismatchError

source = open("specs/model.kyn").read()

# Create or overwrite the lock
create_lock(source, "kynml.lock", source_path="specs/model.kyn")

# Verify before training (raises LockMismatchError if the spec changed)
try:
    check_lock(source, "kynml.lock")
except LockMismatchError as e:
    print(e)
    # Re-run create_lock or pass --check-lock to regenerate

LockMismatchError is a subclass of KynMLError, so the CLI catches it with the standard error handler.

check_lock raises FileNotFoundError if the lock file does not exist. create_lock creates parent directories as needed.


Sweep results

When using kynml sweep, the orchestrator aggregates each combo's run_manifest.json into sweep_results.json in the output directory. Each entry includes the combo dict, the script path, the exit code, and the full manifest. See CLI-Reference § sweep.