Tutorial: Regression — Housing Price Prediction
End-to-end walkthrough: train a feedforward network to predict a continuous target from tabular data.
What you will build
A model that reads a CSV of housing features, trains with MSE loss, evaluates with MAE and RMSE, and exports a portable .pt state dict. The full pipeline runs with a single kynml train command.
Prerequisites
pip install kynml # core install — no extras needed for CSV + torch export
Python 3.11+. PyTorch is a hard dependency and is installed automatically.
1. Prepare your data
KynML's CSV connector reads any delimiter-separated file with a header row. Categorical columns are one-hot encoded automatically via pd.get_dummies; numeric columns pass through.
The bundled example dataset lives at data/housing.csv (10 numeric features, target column price):
f1,f2,f3,f4,f5,f6,f7,f8,f9,f10,price
1200,3,2,15,0.20,8,1,0,5,2,240000
1500,3,2,10,0.25,7,1,1,6,2,285000
...
2. Write the spec
Save as house_price.kyn:
dataset HouseData:
source = csv("data/housing.csv")
target = "price"
split = 0.8
normalize = true
model HousePriceModel:
input 10
dense 64 relu
dense 32 relu
dense 1 linear
train:
model = HousePriceModel
data = HouseData
loss = mse
optimizer = adam(lr=0.001)
epochs = 20
batch = 32
device = auto
evaluate:
metrics = [mae, rmse]
export:
format = torch
path = "models/house_price_model.pt"
Block-by-block breakdown
dataset — declares the data source and preprocessing:
| Key | Value | Notes |
|---|---|---|
source |
csv("data/housing.csv") |
Path is relative to CWD at compile time |
target |
"price" |
Column name to predict |
split |
0.8 |
80 % train / 20 % test |
normalize |
true |
Z-score normalisation via StandardScaler (fit on train, applied to test) |
model — defines a nn.Sequential network:
input 10— declares the feature dimension (must match data after encoding)dense 64 relu—nn.Linear(10, 64)followed bynn.ReLU()dense 32 relu—nn.Linear(64, 32)followed bynn.ReLU()dense 1 linear—nn.Linear(32, 1)with no activation (raw regression output)
train — training loop parameters:
loss = mse→nn.MSELoss()optimizer = adam(lr=0.001)→optim.Adam(..., lr=0.001)device = auto→ CUDA if available, else CPU
evaluate — computed after training on the held-out test split. Supported: mae, mse, rmse, accuracy.
export — format = torch calls torch.save(model.state_dict(), path).
3. Validate and compile
# Check syntax and semantics without generating code
.venv/bin/python -m kynml.cli validate house_price.kyn
# Inspect the parsed AST
.venv/bin/python -m kynml.cli ast house_price.kyn
# Emit the PyTorch script without running it
.venv/bin/python -m kynml.cli compile house_price.kyn --out generated/house_price.py
4. Train
.venv/bin/python -m kynml.cli train house_price.kyn
Output:
Epoch 1/20 - loss: 12345678.2341
Epoch 2/20 - loss: 9823456.7812
...
Epoch 20/20 - loss: 1234567.3401
mae: 8432.1234
rmse: 11203.5678
Saved model to /path/to/models/house_price_model.pt
5. What the generated PyTorch looks like
kynml compile emits a complete, standalone Python file. The key sections for regression:
# Dataset loading (regression path — targets as float32, shape [-1, 1])
def load_dataset() -> tuple[DataLoader, DataLoader]:
df = pd.read_csv(DATASET_PATH)
features = df.drop(columns=[TARGET_COLUMN])
target = df[TARGET_COLUMN]
numeric_features = pd.get_dummies(features, drop_first=False)
x = numeric_features.astype("float32").to_numpy()
y = target.astype("float32").to_numpy()
if y.ndim == 1:
y = y.reshape(-1, 1)
x_train, x_test, y_train, y_test = train_test_split(
x, y, train_size=0.8, shuffle=True, random_state=42,
)
if NORMALIZE:
scaler = StandardScaler()
x_train = scaler.fit_transform(x_train)
x_test = scaler.transform(x_test)
...
# Model class — name taken from the model block identifier
class HousePriceModel(nn.Module):
def __init__(self) -> None:
super().__init__()
self.net = nn.Sequential(
nn.Linear(10, 64),
nn.ReLU(),
nn.Linear(64, 32),
nn.ReLU(),
nn.Linear(32, 1),
)
def forward(self, x: torch.Tensor) -> torch.Tensor:
return self.net(x)
# Export
def export_model(model: nn.Module) -> None:
if EXPORT_FORMAT == "torch":
torch.save(model.state_dict(), path)
6. Next steps
- Add regularisation: dropout, batchnorm
- Speed up training: Speed Guide —
precision = fp16,compile = true,num_workers - Serve predictions: see the
kynml.servingmodule forgenerate_service - Switch to AdamW for weight decay:
optimizer = adamw(lr=0.001, weight_decay=0.01)
Troubleshooting
Target column 'price' not found — confirm the column name exactly matches target in your spec (case-sensitive).
input size mismatch — the input N value must equal the number of columns after pd.get_dummies encoding. Print numeric_features.shape[1] in the generated script to check.
CUDA out of memory — reduce batch size or switch device = cpu.