Cookbook

Task-oriented recipes covering common KynML workflows. Every spec here has been verified against the live compiler.

1. Minimal Regression

Predict a continuous target from a CSV. The smallest valid .kyn file:

dataset HouseData:
    source = csv("data/housing.csv")
    target = "price"
    split = 0.8

model HousePriceModel:
    input 10
    dense 64 relu
    dense 1 linear

train:
    model = HousePriceModel
    data = HouseData
    loss = mse
    optimizer = adam(lr=0.001)
    epochs = 20
    batch = 32

.venv/bin/python -m kynml.cli train examples/house_price.kyn

2. Binary Classification

Sigmoid output + BCE loss. Target column should be 0/1 integers.

dataset CustomerChurn:
    source = csv("data/churn.csv")
    target = "churn"
    split = 0.75
    normalize = true

model ChurnModel:
    input 15
    dense 64 relu
    dropout 0.3
    dense 32 relu
    dense 1 sigmoid

train:
    model = ChurnModel
    data = CustomerChurn
    loss = bce
    optimizer = adam(lr=0.001)
    epochs = 30
    batch = 64
    device = auto

evaluate:
    metrics = [accuracy]

export:
    format = torch
    path = "models/churn_model.pt"

3. Multiclass Classification

cross_entropy loss causes the codegen to emit long targets and argmax metric evaluation. Output units must equal the number of classes. Use log_softmax with nll loss as an alternative:

dataset IrisData:
    source = csv("data/iris.csv")
    target = "species"
    split = 0.8
    normalize = true

model IrisClassifier:
    input 4
    dense 32 relu
    dense 16 relu
    dense 3 softmax

train:
    model = IrisClassifier
    data = IrisData
    loss = cross_entropy
    optimizer = adamw(lr=0.005, weight_decay=0.01)
    epochs = 50
    batch = 16

evaluate:
    metrics = [accuracy]

export:
    format = onnx
    path = "models/iris.onnx"
    input_shape = [1, 4]
    opset = 17

Note: onnx format requires input_shape. opset defaults to 17.

4. Mixed Precision Training (fp16)

Add precision = fp16 to the train block. The codegen emits torch.amp.autocast + GradScaler and falls back gracefully when CUDA is absent:

dataset SalesData:
    source = csv("data/sales.csv")
    target = "revenue"
    split = 0.8
    normalize = true
    num_workers = 4
    pin_memory = true

model SalesPredictor:
    input 12
    batchnorm
    dense 256 gelu
    dropout 0.2
    dense 128 gelu
    dropout 0.1
    dense 1 linear

train:
    model = SalesPredictor
    data = SalesData
    loss = huber
    optimizer = adamw(lr=0.001, weight_decay=0.01)
    epochs = 100
    batch = 128
    device = auto
    precision = fp16
    compile = true
    scheduler = cosine(t_max=100)

compile = true emits torch.compile(model). num_workers and pin_memory map directly to DataLoader arguments. prefetch (integer) maps to prefetch_factor.

5. Early Stopping

Stop training when validation loss stops improving:

train:
    model = MyModel
    data = MyData
    loss = mse
    optimizer = adam(lr=0.001)
    epochs = 200
    batch = 32
    early_stop = early_stop(patience=10)

patience is the number of epochs to wait before stopping. Mode defaults to "min" (stop when loss stops decreasing). The generated code tracks the best epoch loss and breaks out of the training loop when the patience counter exceeds the threshold.

6. Checkpointing with Resume

Save a checkpoint every N epochs, auto-resume from the latest:

train:
    model = MyModel
    data = MyData
    loss = mse
    optimizer = adam(lr=0.001)
    epochs = 100
    batch = 32
    checkpoint = checkpoint(every_n=10, path="checkpoints/ckpt.pt", async_save=true)

async_save = true writes each checkpoint on a daemon thread so it does not block the training loop. On the next run with the same config, the generated script detects the checkpoint file and resumes from the saved epoch and optimizer state.

7. OneCycle Scheduler

LR range policy for faster convergence:

train:
    model = SalesPredictor
    data = SalesData
    loss = mse
    optimizer = sgd(lr=0.01, momentum=0.9)
    epochs = 50
    batch = 64
    scheduler = onecycle(max_lr=0.1)

Other schedulers: step(step_size=10, gamma=0.1) and cosine(t_max=50).

8. ONNX Export and Inference

Compile to ONNX for deployment to ONNX Runtime, TensorRT, or CoreML:

export:
    format = onnx
    path = "models/model.onnx"
    input_shape = [1, 10]
    opset = 17

input_shape is a batch-first shape used for the dummy input tensor in torch.onnx.export. The exported model can be loaded with:

import onnxruntime as ort
sess = ort.InferenceSession("models/model.onnx")
out = sess.run(None, {"input": x_np})

9. TorchScript Export

For environments where ONNX is unavailable:

export:
    format = torchscript
    path = "models/model.pt"

The generated script calls torch.jit.script(model) and saves the result. Load with:

model = torch.jit.load("models/model.pt")

10. Validate Then Compile Separately

Useful in CI to catch spec errors before running an expensive training job:

# Validate only (no torch needed, fast)
.venv/bin/python -m kynml.cli validate examples/house_price.kyn

# Emit the script for inspection
.venv/bin/python -m kynml.cli compile examples/house_price.kyn \
    --out generated/house_price.py

# Inspect the generated code
cat generated/house_price.py

# Run training
.venv/bin/python -m kynml.cli train examples/house_price.kyn

11. Generate a Serving Container

After training, emit a FastAPI service:

from kynml.parser import parse_file
from kynml.semantic import validate_program
from kynml.serving import generate_service

program = parse_file("examples/house_price.kyn")
validate_program(program)
generate_service(program, "models/house_price_model.pt", "service/")

cd service/
docker build -t house-price-service .
docker run -p 8000:8000 house-price-service

curl -X POST http://localhost:8000/predict \
  -H "Content-Type: application/json" \
  -d '{"features": {"sqft": 1500.0, "bedrooms": 3.0, "bathrooms": 2.0}}'

12. BatchNorm Layer

batchnorm can appear anywhere after the first input layer. If you omit the features argument, it infers from the previous layer size:

model DeepNet:
    input 20
    batchnorm
    dense 128 relu
    batchnorm 128
    dropout 0.3
    dense 64 leaky_relu
    dense 1 linear

batchnorm with no argument infers the feature count from the preceding input or dense layer. batchnorm 128 is explicit and useful if you want to be precise.