How to optimize building validation decision thresholds?

This guide explains how to optimize decision thresholds following the strategy in this note.

Requirements

To optimize the decision thresholds you must be able to evaluate the level of automation that can be reached on data that matches production data. As a result, you need to have corrected data i.e. data of which a rule-based classification was corrected and for which you keep track of the corrections that were made. For building validation, the classification must have codes to distinguish false positive, false negative, and true positive. Theses codes may be configured with parameter buildings_correction_labels under configuration group bulding_validation.optimization.

Furthermore, the point cloud data must include predictions from the deep learning model trained to detect buildings. This consists in two channels : a building channel with predicted probabilities and an entropy channel.

A large validation dataset might help having a better sense of the app performances. We used 15km² of corrected data to optimize thresholds, but a larger set might provide more diversity. This being said, performance on an unseen test set was almost equal to performance on the validation set, which indicates a robust evaluation for such volume of data.

Running thresholds optimization

Finding optimal thresholds

Refer to the installation tutorial to set up your python environment.

Your corrected data must live in a single input_las_dir directory as a set of LAS/LAZ files. Prepared and updated files will be saved in subfolder of a results_output_dir directory (./prepared and ./updated/, respectively). They will keep the same basename as the original files. Be sure that the data_format configurations match your data, and in particular the (clasification) codes and las_dimensions configuration groups. A todo string parameter specifies the steps to run by including 1 or more of the following keywords: prepare | otpimize | evaluate | update.

Run the full optimization module with

conda activate lidar_prod

python lidar_prod/run.py \
++task=optimize_building \
building_validation.optimization.todo='prepare+optimize+evaluate+update' \
building_validation.optimization.paths.input_las_dir=[path/to/labelled/val/dataset/] \
building_validation.optimization.paths.results_output_dir=[path/to/save/results]

Evaluation of optimized thresholds on a test set

Once an optimal solution was found, you may want to evaluate the decision process on unseen data to evaluate generalization capability. For that, you will need another test folder of corrected data in the same format as before (a different input_las_dir). You need to specify that no optimization is required using the todo params. You also need to give the path to the decision thresholds file (yaml file) from the previous step, and specify a different results_output_dir so that prepared data of test and val test are not pooled together.

conda activate lidar_prod

python lidar_prod/run.py \
++task=optimize_building \
building_validation.optimization.todo='prepare+evaluate+update' \
building_validation.optimization.paths.input_las_dir=[path/to/labelled/test/dataset/] \
building_validation.optimization.paths.results_output_dir=[path/to/save/results] \
building_validation.optimization.paths.building_validation_thresholds=[path/to/optimized_thresholds.yaml] \
building_validation.optimization.paths.evaluation_results_yaml=[path/to/saved/metrics.yaml]

Utils

Debug mode: to run on a single file during development, add a +building_validation.optimization.debug=true flag to the command line.

Reference: