Docs FAQ

FAQ

Common questions, errors, and gotchas.

Language and Syntax

Q: What indentation does KynML require?

Exactly 4 spaces for block body lines. Tabs are not supported. Top-level block headers must start at column 0. Any deviation raises a KynMLParseError.

Q: Can I use comments?

Yes. Any line whose first non-whitespace character is # is treated as a comment and silently dropped before parsing.

# Train a regression model on housing data
dataset HouseData:
    source = csv("data/housing.csv")
    target = "price"  # this is the label column

Note: inline comments (after a value on the same line) are not currently supported — the # ... must be the only content on the line.

Q: What activations are supported?

relu, leaky_relu, gelu, sigmoid, tanh, softmax, log_softmax, linear.

linear emits no activation module (bare nn.Linear output). softmax emits nn.Softmax(dim=1). log_softmax emits nn.LogSoftmax(dim=1) and pairs with nll loss.

Q: Which loss should I use?

Task Loss Notes
Regression mse or huber huber is less sensitive to outliers than mse
Regression (L1) l1 or mae Both compile to nn.L1Loss()
Binary classification bce Pair with sigmoid output
Multiclass cross_entropy Targets must be integer class indices; codegen uses long dtype
Log-prob outputs nll Pair with log_softmax output

Q: What does device = auto do?

The generated script resolves the device at runtime: "cuda" if torch.cuda.is_available() else "cpu". Setting device = cuda raises at startup if CUDA is absent. device = cpu forces CPU regardless of hardware.

Q: Can I define multiple datasets or models in one file?

Yes — you can define multiple dataset and model blocks. The train block selects which pair to use via model = <Name> and data = <Name>. Only one train, evaluate, and export block is permitted per file.


Errors

KynMLParseError: Expected a top-level block

The parser encountered a line at column 0 that does not start with dataset, model, train, evaluate, or export. Common causes: a typo in the block keyword, or a body line accidentally de-indented.

KynMLSemanticError: Dataset 'X' source must be csv("path")

Only csv("path") is currently accepted as a source value in compiled programs. The huggingface(...) and objectstore(...) integration modules are runtime helpers for hand-written scripts, not yet first-class source keywords in the compiler.

KynMLSemanticError: Unknown model 'X'. Did you mean 'Y'?

The model name in train: model = X does not match any model X: block in the file. The semantic checker uses difflib to suggest close matches.

KynMLSemanticError: Export format 'onnx' requires input_shape

Add input_shape = [batch, features] to the export block, e.g. input_shape = [1, 10].

KynMLCodegenError: BatchNorm1d needs a size

A batchnorm layer appeared before any input or dense layer could establish the current width. Either move it after input N or after a dense layer, or explicitly specify the feature count: batchnorm 64.

KynMLError: Train block is missing loss (and similar)

All of model, data, loss, optimizer, epochs, and batch are required in the train block. None have defaults.


DataLoader and Performance

Q: How do I use multiple DataLoader workers?

dataset MyData:
    source = csv("data/train.csv")
    target = "y"
    num_workers = 4
    pin_memory = true
    prefetch = 2

num_workers maps to DataLoader(num_workers=N). pin_memory = true maps to pin_memory=True. prefetch maps to prefetch_factor.

Q: What does precision = fp16 do on CPU?

The generated script checks torch.cuda.is_available() at runtime. If CUDA is absent, the AMP scaler is set to None and the training loop falls back to standard FP32 backward/step. Specifying precision = fp16 on a CPU-only machine is safe but has no effect.

Q: Does compile = true work with all PyTorch versions?

torch.compile requires PyTorch 2.0+. The generated script calls it unconditionally when compile = true is set — if your torch version is older, the generated script will raise AttributeError. The KynML runtime dependency requires torch but does not pin a minimum version.


Serving and MCP

Q: The /predict endpoint sorts my feature keys — is that correct?

Yes. Both the generated training script and the inference service sort feature dict keys alphabetically before assembling the input tensor. As long as you pass the same feature names at inference time that existed as columns in the training CSV (minus the target column), the ordering is deterministic.

Q: Can I use the MCP server without Claude?

Yes. Any MCP-compatible client works. The server speaks the MCP stdio transport protocol — it does not care what client drives it. Cursor, custom agent frameworks, or even manual JSON messages piped to stdin all work.

Q: kynml_predict reconstructs the model without activations — is that correct?

For the state dict reconstruction path, yes. Activation modules (ReLU, GELU, etc.) have no parameters and are not stored in the state dict. The kynml_predict tool reconstructs only the nn.Linear layers from the weight shapes. This is correct for inference of the final scalar or vector output, but note that the intermediate representations differ from a full forward pass through the original model (activations are skipped). For exact inference, use the generated service (which reconstructs the full model class) or load the model class directly.


Package and Installation

Q: Do I need PyTorch to run kynml validate or kynml compile?

No. The parser, semantic checker, and codegen are pure Python with zero torch dependency. Only the generated script (and kynml train, which runs it) requires torch.

Q: Which extras should I install?

Use case Extra
FastAPI inference service pip install 'kynml[serving]'
MCP server for agents pip install 'kynml[mcp]'
HuggingFace dataset loading pip install 'kynml[hf]'
S3/R2 Parquet loading pip install 'kynml[objectstore]'
Everything pip install 'kynml[all]'

Q: The project version is 0.1.0 in pyproject.toml — why does the changelog say 0.2.0?

pyproject.toml reflects the last released PyPI tag. The changelog tracks the logical feature versions including unreleased work. The 0.2.0 entry covers the major expansion shipped in the current codebase; the package version will be bumped to match on the next release.

See Also