lidar_prod.tasks

building_validation

class lidar_prod.tasks.building_validation.BuildingValidationClusterInfo(probabilities: ndarray, overlays: ndarray, entropies: ndarray, target: int | None = None)[source]

Bases: object

Elements needed to confirm, refute, or be uncertain about a cluster of candidate building points.

entropies: ndarray

overlays: ndarray

probabilities: ndarray

target: int | None = None

class lidar_prod.tasks.building_validation.BuildingValidator(shp_path: str | None = None, bd_uni_connection_params=None, cluster=None, bd_uni_request=None, data_format=None, thresholds=None, use_final_classification_codes: bool = True)[source]

Bases: object

Logic of building validation.

The candidate building points identified with a rule-based algorithm are cluster together. The BDUni building vectors are overlayed on the points clouds, and points that fall under a vector are flagged. Then, classification dim is updated on a per-group basis, based on both AI probabilities and BDUni flag.

See README.md for the detailed process.

prepare(input_values: str | Pipeline, prepared_las_path: str, save_result: bool = False, las_metadata: dict | None = None) → dict[source]

run(input_values: str | Pipeline, target_las_path: str | None = None, las_metadata: dict | None = None) → dict[source]

Runs application.

Transforms cloud at input_values following building validation logic, and saves it to target_las_path

Parameters:

input_values¶ (str| pdal.pipeline.Pipeline) – path or pipeline to input LAS file with
channel (a _sphinx_paramlinks_lidar_prod.tasks.building_validation.BuildingValidator.run.building probability) –
target_las_path¶ (str) – path for saving updated LAS file.
las_metadata¶ (dict) – current pipeline metadata, used to propagate input metadata to the

application output las (epsg, las version, etc)

Returns:: returns las_metadata: metadata of the input las, which contain information to pass to the writer in order for the application to have an output with the same header (las version, srs, …) as the input
Return type:: str

setup()[source]: Setup. Defines useful variables.

update(src_las_path: str | None = None, target_las_path: str | None = None, las_metadata: dict | None = None) → dict[source]: Updates point cloud classification channel.

class lidar_prod.tasks.building_validation.thresholds(min_confidence_confirmation: float, min_frac_confirmation: float, min_frac_confirmation_factor_if_bd_uni_overlay: float, min_uni_db_overlay_frac: float, min_confidence_refutation: float, min_frac_refutation: float, min_entropy_uncertainty: float, min_frac_entropy_uncertain: float)[source]

Bases: object

The decision thresholds for a cluser-level decisions.

dump(filename: str)[source]

static load(filename: str)[source]

min_confidence_confirmation: float

min_confidence_refutation: float

min_entropy_uncertainty: float

min_frac_confirmation: float

min_frac_confirmation_factor_if_bd_uni_overlay: float

min_frac_entropy_uncertain: float

min_frac_refutation: float

min_uni_db_overlay_frac: float

building_validation_optimization

class lidar_prod.tasks.building_validation_optimization.BuildingValidationOptimizer(todo: str, paths: Dict[str, str], building_validator: BuildingValidator, study: Study, design: Any, buildings_correction_labels: Any, use_final_classification_codes: bool = False, debug=False)[source]

Bases: object

Optimizer of the decision thresholds used by BuildingValidator.

In lidar-prod, each task is implemented by a dedicated python class. Building Validation is implemented via a BuildingValidator class. We make sure that all parameters used for optimization are the one we actually use in production.

For a higher internal cohesion, BuildingValidator does not know anything about optimization, which is taken care of by a BuildingValidationOptimizer python class. Two dataclasses are used to connect the two objects. BuildingValidationClusterInfo describes the cluster-level information, necessary to perform a validation. thresholds describes the different thresholds used in BuildingValidator and optimized in BuildingValidationOptimizer.

In Building Validation, the most time-consuming step is the preparation of data, including the clustering of candidate building points and the overlay of vectors of buildings from a public databse: up to several minutes per km² of data. The BuildingValidationOptimizer breaks down the Building Validation steps to make sure that data preparation only occurs onces. All outputs and intermediary files are stored in a results_output_dir directory, so that operations may be resumed at any steps, for instance to rerun a thresholds optimization with a different optimizer configuration.

evaluate() → dict[source]

Evaluation step.

Deserializes the set of optimal thresholds. Deserializes the clusters informations. Computes the Recall, Precision, and Automation of the BuildingValidator on the clusters using optimal thresholds, as well as other metrics including confusion matrices. If a validation dataset was used for optimization, this evaluation may be ran on a test dataset.

Returns:: a dictionnary of metrics of schema {metric_name:metric_value}.
Return type:: dict

evaluate_decisions(mts_gt, ia_decision) → Dict[str, Any][source]

Evaluate confirmation and refutation decisions.

Get dict of metrics to evaluate how good module decisions were in reference to ground truths.

Targets: U=Unsure, N=No (not a building), Y=Yes (building)

Predictions : U=Unsure, C=Confirmation, R=Refutation

Confusion Matrix (horizontal: target, vertical: predictions)

[Uu Ur Uc]

[Nu Nr Nc]

[Yu Yr Yc]

Automation: Proportion of each decision among total of candidate groups.

Accuracies: Confirmation/Refutation Accuracy. Accurate decision if either “unsure” or the same as the label.

Quality Precision and Recall, assuming perfect posterior decision for unsure predictions. Only candidate shapes with known ground truths are considered (ambiguous labels are ignored).

Precision : (Yu + Yc) / (Yu + Yc + Nc)

Recall : (Yu + Yc) / (Yu + Yn + Yc)

Parameters:

mts_gt¶ (np.array) – ground truth of rule- based classification (0, 1, 2)
ia_decision¶ (np.array) – AI application decision (0, 1, 2)

Returns:

dictionnary of metrics.

Return type:

dict

optimize()[source]

Optimization step.

Deserializes the clusters informations. Runs the genetic algorithm for N generations. For each set of decision thresholds, computes the Recall, Precision, and Automation of the BuildingValidator. Finally, serializes the set of optimal thresholds.

prepare()[source]

Preparation step.

Prepares and saves each point cloud in the specified directory, and extracts all cluster information in a list of BuildingValidationClusterInfo objects that is serialized into a pickle object.

run()[source]: Run decision threshold optimization.

save_config_with_optimized_thresolds(config: DictConfig)[source]: Save config the thresholds in the building_validation.application part replaced by optimized thresholds

setup()[source]

Setup step.

Setup a few attributes and override BuildingValidator classification codes to adapt to those of the optimization dataset.

update()[source]

Update step.

Deserializes the set of optimal thresholds. BuildingValidator updates each prepared point cloud classification based on those threshods and saves the result.

lidar_prod.tasks.building_validation_optimization.constraints_func(trial)[source]

building_completion

class lidar_prod.tasks.building_completion.BuildingCompletor(min_building_proba: float = 0.5, cluster=None, data_format=None)[source]

Bases: object

Logic of building completion.

The BuildingValidator only considered points that were 1) candidate, and 2) formed clusters of sufficient size.

Some points were too isolated, or where not clustered, even though they might have a high predicted building probabiliy. We assume that we can trust AI probabilities (if high enough) in the neigborhood of large groups (clusters) of candidate points already confirmed by the BuildingValidator.

We will update points classification based on their probability as well as their surrounding: - We select points that have p>=0.5 (+ a BDUni factor when applicable) - We perform vertical (XY) clustering of A) these points, together with B) confirmed buildings. - If the resulting clusters contain confirmed buildings, points with high probability are considered to be part of the confirmed building and their class is updated accordingly.

prepare_for_building_completion(pipeline: Pipeline) → None[source]

Prepare for building completion.

Identify candidates that have high enough probability. Then, cluster them together with previously confirmed buildings. Cluster parameters are relaxed (2D, with high tolerance). If a cluster contains some confirmed points, the others are considered to belong to the same building and they will be confirmed as well.

Parameters:: pipeline¶ (pdal.pipeline.Pipeline) – input LAS pipeline

run(input_values: str | Pipeline, las_metadata: dict | None = None) → dict[source]

Application.

Transform cloud at src_las_path following building completion logic

Parameters:

input_values¶ (str|pdal.pipeline.Pipeline) – path to either input LAS file or a pipeline
target_las_path¶ (str) – path for saving updated LAS file.
las_metadata¶ (dict) – current pipeline metadata, used to propagate input metadata to the

application output las (epsg, las version, etc)

Returns:: returns las_metadata: metadata of the initial las, which contain information to pass to the writer in order for the application to have an output with the same header (las version, srs, …) as the input
Return type:: str

update_classification() → None[source]: Update Classification dimension by completing buildings with high probability points.

building_identification

class lidar_prod.tasks.building_identification.BuildingIdentifier(min_building_proba: float = 0.5, cluster=None, data_format=None)[source]

Bases: object

Logic of building validation.

Points that were not found by rule-based algorithms but which have a high-enough probability of being a building are clustered into candidate groups of buildings.

High enough probability means p>=min_building_proba

run(input_values: str | Pipeline, target_las_path: str | None = None, las_metadata: dict | None = None) → dict[source]

Identify potential buildings in a new channel, excluding former candidates as well as already confirmed building (confirmed by either Validation or Completion).

Parameters:

input_values¶ (str | pdal.pipeline.Pipeline) – path or pipeline to input LAS file with
channel (a _sphinx_paramlinks_lidar_prod.tasks.building_identification.BuildingIdentifier.run.building probability) –
target_las_path¶ (str) – output LAS
las_metadata¶ (dict) – current pipeline metadata, used to propagate input metadata to the

application output las (epsg, las version, etc)

Returns: updated las_metadata

cleaning

class lidar_prod.tasks.cleaning.Cleaner(extra_dims: Iterable[str] | str | None)[source]

Bases: object

Keep only necessary extra dimensions channels.

add_dimensions(las_data: LasData)[source]: Add the dimensions that exist in self.extra_dimensions but not in las data

get_extra_dims_as_str()[source]: ‘stringify’ the extra_dims list and return it, or an empty list if there is no extra dims

remove_dimensions(las_data: LasData)[source]: remove dimension from (laspy) data

run(src_las_path: str, target_las_path: str, epsg: int | str)[source]

Clean out LAS extra dimensions.

Parameters:

src_las_path¶ (str) – input LAS path
target_las_path¶ (str) – output LAS path, with specified extra dims.
epsg¶ (int | str) – epsg code for the input file (if empty or None: infer

it from the las metadata)

utils

class lidar_prod.tasks.utils.BDUniConnectionParams(host: str, user: str, pwd: str, bd_name: str)[source]

Bases: object

URL and public credentials to connect to a database - typically the BDUni

bd_name: str

host: str

pwd: str

user: str

lidar_prod.tasks.utils.check_bbox_intersects_territoire_with_srid(bd_params: BDUniConnectionParams, bbox: Dict[str, int], epsg_srid: int | str)[source]: Check if a bounding box intersects one of the territories from the BDUni database (public.gcms_territoire) with the expected srid. As geometries are indicated with srid = 0 in the database (but stored in their original projection), both geometries are compared using this common srid. In the territoire geometry query, ST_Union is used to combine different territoires that would have the same srid (eg. 5490 for Guadeloupe and Martinique)

lidar_prod.tasks.utils.get_a_las_to_las_pdal_pipeline(src_las_path: str, target_las_path: str, ops: Iterable[Any], epsg: int | str)[source]

Create a pdal pipeline, preserving format, forwarding every dimension.

Parameters:

src_las_path¶ (str) – input LAS path
target_las_path¶ (str) – output LAS path
ops¶ (Iterable[Any]) – list of pdal operation (e.g. Filter.assign(…))
epsg¶ (int | str) – epsg code for the input file (if empty or None: infer it from the
metadata¶) (las) –

lidar_prod.tasks.utils.get_input_las_metadata(pipeline: Pipeline)[source]: Get las reader metadata from the input pipeline

lidar_prod.tasks.utils.get_integer_bbox(pipeline: Pipeline, buffer: Number = 0) → Dict[str, int][source]

Get XY bounding box of the las input of a pipeline, cast x/y min/max to integers.

Parameters:

pipeline¶ (pdal.pipeline.Pipeline) – pipeline for which to read the input bounding box
buffer¶ (Number, optional) – buffer to add to the bounds before casting it to integers.
0. (Defaults _sphinx_paramlinks_lidar_prod.tasks.utils.get_integer_bbox.to) –

Returns:

x/y min/max values as a dictionary

Return type:

Dict[str, int]

lidar_prod.tasks.utils.get_las_data_from_las(las_path: str, epsg: str | int | None = None) → LasData[source]: Load las data from a las file

lidar_prod.tasks.utils.get_pdal_reader(las_path: str, epsg: int | str) → las[source]

Standard Reader which imposes Lamber 93 SRS.

Parameters:

las_path¶ (str) – input LAS path to read.
epsg¶ (int | str) – epsg code for the input file (if empty or None: infer
metadata) (it _sphinx_paramlinks_lidar_prod.tasks.utils.get_pdal_reader.from the las) –

Returns:

reader to use in a pipeline.

Return type:

pdal.Reader.las

lidar_prod.tasks.utils.get_pdal_writer(target_las_path: str, reader_metadata={}, extra_dims: str = 'all') → las[source]

Standard LAS Writer which imposes LAS 1.4 specification and dataformat 8.

Parameters:

target_las_path¶ (str) – output LAS path to write.
extra_dims¶ (str) – extra dimensions to keep, in the format expected by pdal.Writer.las.

Returns:

writer to use in a pipeline.

Return type:

pdal.Writer.las

lidar_prod.tasks.utils.get_pipeline(input_value: Pipeline | str, epsg: int | str, las_metadata: dict | None = None)[source]

If the input value is a pipeline, returns it, if it’s a las path return the corresponding pipeline, If the input is a las_path, pipeline_metadata is updated to the new pipeline metadata

Parameters:

input_value¶ (pdal.pipeline.Pipeline | str) – input value to get a pipeline from
pdal) ((las _sphinx_paramlinks_lidar_prod.tasks.utils.get_pipeline.pipeline or path to a file to read with) –
epsg¶ (int | str) – if input_value is a string, use the epsg value to override the crs from
header (the _sphinx_paramlinks_lidar_prod.tasks.utils.get_pipeline.las) –
las_metadata¶ (dict) – current pipeline metadata, used to propagate input metadata to the
las (application _sphinx_paramlinks_lidar_prod.tasks.utils.get_pipeline.output) –

Returns:

pdal pipeline, updated pipeline_metadata dict

lidar_prod.tasks.utils.pdal_read_las_array(las_path: str, epsg: str | int | None = None)[source]

Read LAS as a named array.

Parameters:

las_path¶ (str) – input LAS path
epsg¶ (int | str) – epsg code for the input file (if empty or None: infer it from the
metadata¶) (las) –

Returns:

named array with all LAS dimensions, including extra ones, with dict-like access. las_metadata dict

Return type:

np.ndarray

lidar_prod.tasks.utils.request_bd_uni_for_building_shapefile(bd_params: BDUniConnectionParams, shapefile_path: str, bbox: Dict[str, int], epsg: int | str)[source]

Request BD Uni for its buildings.

Create a shapefile with non destructed building on the area of interest and saves it.

Also add a “PRESENCE” column filled with 1 for later use by pdal.

Note on the projections: Projections are mixed in the BDUni tables. In PostGIS, the declared projection is 0 but the data are stored in the legal projection of the corresponding territories. In each table, there is a a “gcms_territoire” field, which tells the corresponding territory (3 letters code). The gcms_territoire table gives hints on each territory (SRID, footprint)

lidar_prod.tasks.utils.save_las_data_to_las(las_path: str, las_data: LasData)[source]: save las data to a las file

lidar_prod.tasks.utils.split_idx_by_dim(dim_array)[source]: Returns a sequence of arrays of indices of elements sharing the same value in dim_array Groups are ordered by ascending value.