Serving
Emit a production FastAPI inference service, Dockerfile, and requirements from a trained KynML model.
Overview
kynml.serving.generate_service reads a validated Program AST and a path to a saved model state dict, then writes three files into an output directory:
app.py— FastAPI application with/predictand/healthendpointsrequirements.txt— pinned runtime dependenciesDockerfile— Python 3.11-slim image, ready to build and push
The emitter performs pure string templating against the AST — no PyTorch import at generation time.
Install
pip install 'kynml[serving]'
Adds fastapi>=0.110 and uvicorn[standard]>=0.29 to the environment.
API
from kynml.serving import generate_service
from kynml.parser import parse_file
from kynml.semantic import validate_program
program = parse_file("examples/house_price.kyn")
validate_program(program)
files = generate_service(
program=program,
model_path="models/house_price_model.pt",
out_dir="service/",
)
# files == {"app": Path("service/app.py"),
# "requirements": Path("service/requirements.txt"),
# "dockerfile": Path("service/Dockerfile")}
Parameters
| Parameter | Type | Description |
|---|---|---|
program |
Program |
Validated KynML AST (call validate_program first) |
model_path |
str \| Path |
Path to the saved torch.save(model.state_dict(), ...) file |
out_dir |
str \| Path |
Output directory (created if absent) |
Returns a dict[str, Path] with keys "app", "requirements", "dockerfile".
Generated Endpoints
POST /predict
Accepts a JSON body with a features dict mapping feature names to floats. Returns a prediction list.
curl -X POST http://localhost:8000/predict \
-H "Content-Type: application/json" \
-d '{"features": {"sqft": 1500.0, "bedrooms": 3.0, "bathrooms": 2.0}}'
Response:
{"prediction": [342150.25]}
The features dict is sorted by key name before being passed to the model. This must match the column order the model was trained on. If normalize = true was set in the dataset block, the generated service includes a scaler hook — populate the scaler global with a fitted StandardScaler instance to activate it.
GET /health
Returns {"status": "ok"}. Use for container health checks and load-balancer probes.
Generated app.py Structure
For a model named HousePriceModel trained with normalize = true:
"""FastAPI inference service generated by KynML serving.
Run with:
uvicorn app:app --host 0.0.0.0 --port 8000
"""
from pathlib import Path
import torch
import torch.nn as nn
from fastapi import FastAPI
from pydantic import BaseModel
from sklearn.preprocessing import StandardScaler
MODEL_PATH = Path("models/house_price_model.pt")
scaler: StandardScaler | None = None # populate if saved separately
class HousePriceModel(nn.Module):
def __init__(self) -> None:
super().__init__()
self.net = nn.Sequential(
nn.Linear(10, 64),
nn.ReLU(),
nn.Linear(64, 32),
nn.ReLU(),
nn.Linear(32, 1),
)
def forward(self, x):
return self.net(x)
# ... load_model, FastAPI app, PredictRequest/Response, /predict, /health
The model class is reconstructed from the KynML model block, not from a pickled class. Only InputLayer and DenseLayer nodes are rendered in the service — DropoutLayer and BatchNorm1dLayer are training-time concerns and are omitted from the inference graph.
Running Locally
cd service/
pip install -r requirements.txt
uvicorn app:app --host 0.0.0.0 --port 8000
Docker Build and Deploy
cd service/
docker build -t house-price-service .
docker run -p 8000:8000 house-price-service
The generated Dockerfile:
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"]
Full Workflow Example
from kynml.parser import parse_file
from kynml.semantic import validate_program
from kynml.serving import generate_service
# 1. Parse and validate the spec
program = parse_file("examples/house_price.kyn")
validate_program(program)
# 2. Train (or assume already trained)
# kynml train examples/house_price.kyn
# -> models/house_price_model.pt
# 3. Emit service
generate_service(
program=program,
model_path="models/house_price_model.pt",
out_dir="service/",
)
# 4. Deploy
# cd service && docker build -t house-price . && docker run -p 8000:8000 house-price
Normalisation Note
When normalize = true is set in the dataset block, the generated service scaffolds a scaler global but cannot populate it automatically — the StandardScaler fitted during training is not serialised by torch.save. To use normalisation at inference time:
- After training, fit and save the scaler separately:
python import pickle pickle.dump(scaler, open("models/scaler.pkl", "wb")) - In
app.py, load it:
python import pickle scaler = pickle.load(open("models/scaler.pkl", "rb"))
A future version of kynml train will emit the scaler alongside the model file automatically.
See Also
- MCP Integration — run predict via agent tool call
- Cookbook — end-to-end train-and-serve recipes
- Architecture — CONNECT package boundary