Serving

Emit a production FastAPI inference service, Dockerfile, and requirements from a trained KynML model.

Overview

kynml.serving.generate_service reads a validated Program AST and a path to a saved model state dict, then writes three files into an output directory:

app.py — FastAPI application with /predict and /health endpoints
requirements.txt — pinned runtime dependencies
Dockerfile — Python 3.11-slim image, ready to build and push

The emitter performs pure string templating against the AST — no PyTorch import at generation time.

Install

pip install 'kynml[serving]'

Adds fastapi>=0.110 and uvicorn[standard]>=0.29 to the environment.

API

from kynml.serving import generate_service
from kynml.parser import parse_file
from kynml.semantic import validate_program

program = parse_file("examples/house_price.kyn")
validate_program(program)

files = generate_service(
    program=program,
    model_path="models/house_price_model.pt",
    out_dir="service/",
)
# files == {"app": Path("service/app.py"),
#           "requirements": Path("service/requirements.txt"),
#           "dockerfile": Path("service/Dockerfile")}

Parameters

Parameter	Type	Description
`program`	`Program`	Validated KynML AST (call `validate_program` first)
`model_path`	`str \\| Path`	Path to the saved `torch.save(model.state_dict(), ...)` file
`out_dir`	`str \\| Path`	Output directory (created if absent)

Returns a dict[str, Path] with keys "app", "requirements", "dockerfile".

Generated Endpoints

`POST /predict`

Accepts a JSON body with a features dict mapping feature names to floats. Returns a prediction list.

curl -X POST http://localhost:8000/predict \
  -H "Content-Type: application/json" \
  -d '{"features": {"sqft": 1500.0, "bedrooms": 3.0, "bathrooms": 2.0}}'

Response:

{"prediction": [342150.25]}

The features dict is sorted by key name before being passed to the model. This must match the column order the model was trained on. If normalize = true was set in the dataset block, the generated service includes a scaler hook — populate the scaler global with a fitted StandardScaler instance to activate it.

`GET /health`

Returns {"status": "ok"}. Use for container health checks and load-balancer probes.

Generated `app.py` Structure

For a model named HousePriceModel trained with normalize = true:

"""FastAPI inference service generated by KynML serving.

Run with:
    uvicorn app:app --host 0.0.0.0 --port 8000
"""
from pathlib import Path
import torch
import torch.nn as nn
from fastapi import FastAPI
from pydantic import BaseModel
from sklearn.preprocessing import StandardScaler

MODEL_PATH = Path("models/house_price_model.pt")

scaler: StandardScaler | None = None  # populate if saved separately

class HousePriceModel(nn.Module):
    def __init__(self) -> None:
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(10, 64),
            nn.ReLU(),
            nn.Linear(64, 32),
            nn.ReLU(),
            nn.Linear(32, 1),
        )

    def forward(self, x):
        return self.net(x)

# ... load_model, FastAPI app, PredictRequest/Response, /predict, /health

The model class is reconstructed from the KynML model block, not from a pickled class. Only InputLayer and DenseLayer nodes are rendered in the service — DropoutLayer and BatchNorm1dLayer are training-time concerns and are omitted from the inference graph.

Running Locally

cd service/
pip install -r requirements.txt
uvicorn app:app --host 0.0.0.0 --port 8000

Docker Build and Deploy

cd service/
docker build -t house-price-service .
docker run -p 8000:8000 house-price-service

The generated Dockerfile:

FROM python:3.11-slim

WORKDIR /app

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . .

CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"]

Full Workflow Example

from kynml.parser import parse_file
from kynml.semantic import validate_program
from kynml.serving import generate_service

# 1. Parse and validate the spec
program = parse_file("examples/house_price.kyn")
validate_program(program)

# 2. Train (or assume already trained)
# kynml train examples/house_price.kyn
# -> models/house_price_model.pt

# 3. Emit service
generate_service(
    program=program,
    model_path="models/house_price_model.pt",
    out_dir="service/",
)

# 4. Deploy
# cd service && docker build -t house-price . && docker run -p 8000:8000 house-price

Normalisation Note

When normalize = true is set in the dataset block, the generated service scaffolds a scaler global but cannot populate it automatically — the StandardScaler fitted during training is not serialised by torch.save. To use normalisation at inference time:

After training, fit and save the scaler separately:
python import pickle pickle.dump(scaler, open("models/scaler.pkl", "wb"))
In app.py, load it:
python import pickle scaler = pickle.load(open("models/scaler.pkl", "rb"))

A future version of kynml train will emit the scaler alongside the model file automatically.