Performing inference on new data

Refer to the tutorials (Linux, Windows) for installation instructions.

To run inference, you will need:

A source cloud point in LAS format on which to infer classes and probabilites. Sample data from the French “Lidar HD” project can be downloaded at this address.
A checkpoint of a trained lightning module implementing model logic (class myria3d.models.model.Model)
A minimal yaml configuration specifying parameters. We use hydra to manage configurations, and this yaml results from the model training. The datamodule and model parameters groups must match dataset characteristics and model training settings. The predict parameters group specifies path to models and data as well as batch size (N=50 works well, the larger the faster) and use of gpu (optionnal). For hints on what to modify, see the experiment/predict.yaml file.

A default model and its configuration are embedded directly in code under folder trained_model_assets. They are expected to always be compatible with the code base, and updated as needed in case of e.g. change of configuration format or model implementation.

Run inference from source

Then, fill out the {missing parameters} below and run:

python run.py \
task.task_name=predict \
predict.src_las={/path/to/cloud.las} \
predict.output_dir={/path/to/out/dir/} \
predict.gpus={0 for none, [i] to use GPU number i} \
datamodule.batch_size={N}

To show you current inference config, simply add a --help flag:

python run.py task.task_name=predict --help

Note that predict.src_las may be any valid glob pattern (e.g. /path/to/multiple_files/*.las), in order to predict on multiple files successively.
If the lidar file doesn’t specify an EPSG in its meatadata, it HAS TO BE be specified with datamodule.epsg=...

Run inference from sources

In case you want to swicth to package-based inference, you will need to comment out the parameters that depends on local environment variables such as logger credentials and training data directory. You can do so by making a copy of your configuration file and commenting out the lines containing oc.env logic.

Run inference from within a docker image

Up to date docker images (named myria3d) are created via Github integration actions (see Developer’s guide.

A docker image encapsulating the virtual environment and application sources can also be built using the provided Dockerfile. At built time, the Dockerfile is not standalone and should be part of the repository - whose content is copied into the image - at the github reference you want to build from.

To run inference:

Mount the needed volumes with the -v option.
Always set --ipc=host to allow multithreading (used in pytorch dataloader, as mentionned in Pytorch’s README).
Increase the shared memory with --shm-size=2gb (which should be enough for 1km*1km point French “Lidar HD” clouds).
Set --gpus=all to make gpus visible to the container if available.

See docker-pytorch README for more details plus an additional option to specify user id at runtime.

# specify your paths here as needed
docker run \
-v {local_inputs}:/inputs/ \
-v {local_output}:/outputs/ \
--ipc=host \
--gpus=all \
--shm-size=2gb \
python run.py {...config paths & options...}

Additional options for prediction

Output dimensions

By default, the predicted classification is stored in a new PredictedClassification LAS dimension. The entropy of probabilities is also stored in a new entropy LAS dimension. It can be used as a very limited proxy of uncertainty.

Change params predict.interpolator.predicted_classification_channel and predict.interpolator.entropy_channel to change name of output dimensions. Set to null to disable saving these dimensions.

One can control for which classes to save the probabilities. This is achieved with a predict.interpolator.probas_to_save config parameter, which can be either the all keyword (to save probabilities for all classes) or a list of specific classes (e.g. predict.interpolator.probas_to_save=[building,vegetation] - note the absence of space between class names).

Receptive field overlap at inference time

To improve spatial regularity of the predicted probabilities, one can make inference on square receptive fields that have a non-null overlap with each other. This has the effect of smoothing out irregular predictions. The resulting classification is better looking, with more homogeneous predictions at the object level.

To define an overlap between successive 50m*50m receptive fields, set predict.subtile_overlap={value}. This, however, comes with a large computation price. For instance, predict.subtile_overlap=25 means a 25m overlap on both x and y axes, which multiplies inference time by a factor of 4.

Ignoring artefacts points during inference

Lidar acquisition may have produced artefacts points. If these points were identified with one (or several) classification code(s), they can be ignored during inference. These points will still be present in the output cloud, but will not negatively disturb model inference. They will keep their original class in the predicted classification dim. They will have null probas and entropy.

In the configuration, data transforms are used to drop points with a class 65. By convention, 65 will flag Lidar artefacts points. Additional classes may be mapped to 65 to be ignored during inference as well, via the dataset_description.classification_preprocessing_dict parameter. Note: you may need to use quotes when overriding this parameter via CLI.