Reproducibility
Every training script generated by KynML writes a run_manifest.json alongside its output. The manifest captures everything needed to audit or reproduce a run: the spec fingerprint, dataset fingerprint, environment versions, RNG seed, and evaluation metrics.
CONFIG_HASH
The CONFIG_HASH is a SHA-256 hex digest (64 chars) of the canonical .kyn source bytes, computed at codegen time and embedded as a module-level constant in the generated script:
CONFIG_HASH = "a3f9d2..." # injected at compile time
It changes whenever the .kyn source changes. Use it to confirm that a generated script matches the spec version that produced it, or to detect accidental spec drift between runs.
Hash computation:
from kynml.repro.manifest import config_hash
digest = config_hash(open("specs/model.kyn").read()) # str → utf-8 encoded internally
run_manifest.json
Written atomically (write-then-rename) to the script's working directory after every training run. The file is always created — there is no opt-in flag.
Schema
{
"kynml_version": "0.1.0",
"config_hash": "a3f9d2...",
"data_hash": "b7c1e4..." | null,
"env": {
"python": "3.11.9",
"torch": "2.3.0",
"numpy": "1.26.4",
"platform": "macOS-15.5-arm64-arm-64bit"
},
"seed": 42 | null,
"metrics": {"mae": 0.123, "rmse": 0.187}
}
Field reference
| Field | Type | Description |
|---|---|---|
kynml_version |
string | KynML version that compiled the script. |
config_hash |
string | SHA-256 of the .kyn source (hex, 64 chars). Matches CONFIG_HASH in the generated script. |
data_hash |
string | null | SHA-256 of the dataset file at runtime, streamed in 64 KiB chunks. null when the file is absent or unreadable (e.g. streaming sources). |
env.python |
string | sys.version first token (e.g. "3.11.9"). |
env.torch |
string | torch.__version__; omitted if torch is not importable. |
env.numpy |
string | numpy.__version__; omitted if numpy is not importable. |
env.platform |
string | platform.platform(). |
seed |
int | null | The RNG seed from the train: block, or null when not set. |
metrics |
object | dict[str, float] of evaluation metrics from the evaluate: block, populated after training. |
Fields are serialised with sort_keys=True so diffs between runs are stable.
Reading a manifest in Python
from kynml.repro.manifest import load_manifest
m = load_manifest("run_manifest.json")
print(m.config_hash) # "a3f9d2..."
print(m.seed) # 42 or None
print(m.metrics) # {"mae": 0.123, "rmse": 0.187}
print(m.env["torch"]) # "2.3.0"
Deterministic seeding
When the train: block specifies seed, the generated script calls _set_seed(seed) before training. This covers:
random.seed(seed)numpy.random.seed(seed)torch.manual_seed(seed)torch.cuda.manual_seed_all(seed)(when CUDA is available)
Adding deterministic = true additionally sets:
torch.use_deterministic_algorithms(True)
torch.backends.cudnn.deterministic = True
.kyn syntax:
train:
model = M
data = D
loss = mse
optimizer = adam(lr=0.001)
epochs = 20
seed = 42
deterministic = true
Without seed, the manifest records "seed": null and the run is not reproducible at the RNG level even if the config_hash matches.
kynml.lock
The lock file pins the config_hash of a .kyn spec so that spec drift is caught before a run. It is opt-in — the compile/train flows are unaffected unless you call the lock API explicitly.
Lock file format (kynml.lock)
{
"backend": "pytorch",
"config_hash": "a3f9d2...",
"kynml_version": "0.1.0",
"schema_version": "1",
"source_path": "specs/model.kyn"
}
Python API
from kynml.lock import create_lock, check_lock, LockMismatchError
source = open("specs/model.kyn").read()
# Create or overwrite the lock
create_lock(source, "kynml.lock", source_path="specs/model.kyn")
# Verify before training (raises LockMismatchError if the spec changed)
try:
check_lock(source, "kynml.lock")
except LockMismatchError as e:
print(e)
# Re-run create_lock or pass --check-lock to regenerate
LockMismatchError is a subclass of KynMLError, so the CLI catches it with the standard error handler.
check_lock raises FileNotFoundError if the lock file does not exist. create_lock creates parent directories as needed.
Sweep results
When using kynml sweep, the orchestrator aggregates each combo's run_manifest.json into sweep_results.json in the output directory. Each entry includes the combo dict, the script path, the exit code, and the full manifest. See CLI-Reference § sweep.