lidar_prod.tasks
building_validation
- class lidar_prod.tasks.building_validation.BuildingValidationClusterInfo(probabilities: ndarray, overlays: ndarray, entropies: ndarray, target: int | None = None)[source]
Bases:
object
Elements needed to confirm, refute, or be uncertain about a cluster of candidate building points.
- entropies: ndarray
- overlays: ndarray
- probabilities: ndarray
- class lidar_prod.tasks.building_validation.BuildingValidator(shp_path: str | None = None, bd_uni_connection_params=None, cluster=None, bd_uni_request=None, data_format=None, thresholds=None, use_final_classification_codes: bool = True)[source]
Bases:
object
Logic of building validation.
The candidate building points identified with a rule-based algorithm are cluster together. The BDUni building vectors are overlayed on the points clouds, and points that fall under a vector are flagged. Then, classification dim is updated on a per-group basis, based on both AI probabilities and BDUni flag.
See README.md for the detailed process.
- prepare(input_values: str | Pipeline, prepared_las_path: str, save_result: bool = False, las_metadata: dict | None = None) dict [source]
- run(input_values: str | Pipeline, target_las_path: str | None = None, las_metadata: dict | None = None) dict [source]
Runs application.
Transforms cloud at input_values following building validation logic, and saves it to target_las_path
- Parameters:
application output las (epsg, las version, etc)
- Returns:
returns las_metadata: metadata of the input las, which contain information to pass to the writer in order for the application to have an output with the same header (las version, srs, …) as the input
- Return type:
- class lidar_prod.tasks.building_validation.thresholds(min_confidence_confirmation: float, min_frac_confirmation: float, min_frac_confirmation_factor_if_bd_uni_overlay: float, min_uni_db_overlay_frac: float, min_confidence_refutation: float, min_frac_refutation: float, min_entropy_uncertainty: float, min_frac_entropy_uncertain: float)[source]
Bases:
object
The decision thresholds for a cluser-level decisions.
building_validation_optimization
- class lidar_prod.tasks.building_validation_optimization.BuildingValidationOptimizer(todo: str, paths: Dict[str, str], building_validator: BuildingValidator, study: Study, design: Any, buildings_correction_labels: Any, use_final_classification_codes: bool = False, debug=False)[source]
Bases:
object
Optimizer of the decision thresholds used by BuildingValidator.
In lidar-prod, each task is implemented by a dedicated python class. Building Validation is implemented via a
BuildingValidator
class. We make sure that all parameters used for optimization are the one we actually use in production.For a higher internal cohesion, BuildingValidator does not know anything about optimization, which is taken care of by a BuildingValidationOptimizer python class. Two dataclasses are used to connect the two objects. BuildingValidationClusterInfo describes the cluster-level information, necessary to perform a validation. thresholds describes the different thresholds used in BuildingValidator and optimized in BuildingValidationOptimizer.
In Building Validation, the most time-consuming step is the preparation of data, including the clustering of candidate building points and the overlay of vectors of buildings from a public databse: up to several minutes per km² of data. The BuildingValidationOptimizer breaks down the Building Validation steps to make sure that data preparation only occurs onces. All outputs and intermediary files are stored in a results_output_dir directory, so that operations may be resumed at any steps, for instance to rerun a thresholds optimization with a different optimizer configuration.
- evaluate() dict [source]
Evaluation step.
Deserializes the set of optimal thresholds. Deserializes the clusters informations. Computes the Recall, Precision, and Automation of the BuildingValidator on the clusters using optimal thresholds, as well as other metrics including confusion matrices. If a validation dataset was used for optimization, this evaluation may be ran on a test dataset.
- Returns:
a dictionnary of metrics of schema {metric_name:metric_value}.
- Return type:
- evaluate_decisions(mts_gt, ia_decision) Dict[str, Any] [source]
Evaluate confirmation and refutation decisions.
Get dict of metrics to evaluate how good module decisions were in reference to ground truths.
Targets: U=Unsure, N=No (not a building), Y=Yes (building)
Predictions : U=Unsure, C=Confirmation, R=Refutation
Confusion Matrix (horizontal: target, vertical: predictions)
[Uu Ur Uc]
[Nu Nr Nc]
[Yu Yr Yc]
Automation: Proportion of each decision among total of candidate groups.
Accuracies: Confirmation/Refutation Accuracy. Accurate decision if either “unsure” or the same as the label.
Quality Precision and Recall, assuming perfect posterior decision for unsure predictions. Only candidate shapes with known ground truths are considered (ambiguous labels are ignored).
Precision : (Yu + Yc) / (Yu + Yc + Nc)
Recall : (Yu + Yc) / (Yu + Yn + Yc)
- optimize()[source]
Optimization step.
Deserializes the clusters informations. Runs the genetic algorithm for N generations. For each set of decision thresholds, computes the Recall, Precision, and Automation of the BuildingValidator. Finally, serializes the set of optimal thresholds.
- prepare()[source]
Preparation step.
Prepares and saves each point cloud in the specified directory, and extracts all cluster information in a list of BuildingValidationClusterInfo objects that is serialized into a pickle object.
- save_config_with_optimized_thresolds(config: DictConfig)[source]
Save config the thresholds in the building_validation.application part replaced by optimized thresholds
building_completion
- class lidar_prod.tasks.building_completion.BuildingCompletor(min_building_proba: float = 0.5, cluster=None, data_format=None)[source]
Bases:
object
Logic of building completion.
The BuildingValidator only considered points that were 1) candidate, and 2) formed clusters of sufficient size.
Some points were too isolated, or where not clustered, even though they might have a high predicted building probabiliy. We assume that we can trust AI probabilities (if high enough) in the neigborhood of large groups (clusters) of candidate points already confirmed by the BuildingValidator.
We will update points classification based on their probability as well as their surrounding: - We select points that have p>=0.5 (+ a BDUni factor when applicable) - We perform vertical (XY) clustering of A) these points, together with B) confirmed buildings. - If the resulting clusters contain confirmed buildings, points with high probability are considered to be part of the confirmed building and their class is updated accordingly.
- prepare_for_building_completion(pipeline: Pipeline) None [source]
Prepare for building completion.
Identify candidates that have high enough probability. Then, cluster them together with previously confirmed buildings. Cluster parameters are relaxed (2D, with high tolerance). If a cluster contains some confirmed points, the others are considered to belong to the same building and they will be confirmed as well.
- Parameters:
pipeline¶ (pdal.pipeline.Pipeline) – input LAS pipeline
- run(input_values: str | Pipeline, las_metadata: dict | None = None) dict [source]
Application.
Transform cloud at src_las_path following building completion logic
- Parameters:
application output las (epsg, las version, etc)
- Returns:
returns las_metadata: metadata of the initial las, which contain information to pass to the writer in order for the application to have an output with the same header (las version, srs, …) as the input
- Return type:
building_identification
- class lidar_prod.tasks.building_identification.BuildingIdentifier(min_building_proba: float = 0.5, cluster=None, data_format=None)[source]
Bases:
object
Logic of building validation.
Points that were not found by rule-based algorithms but which have a high-enough probability of being a building are clustered into candidate groups of buildings.
High enough probability means p>=min_building_proba
- run(input_values: str | Pipeline, target_las_path: str | None = None, las_metadata: dict | None = None) dict [source]
Identify potential buildings in a new channel, excluding former candidates as well as already confirmed building (confirmed by either Validation or Completion).
- Parameters:
application output las (epsg, las version, etc)
Returns: updated las_metadata
cleaning
- class lidar_prod.tasks.cleaning.Cleaner(extra_dims: Iterable[str] | str | None)[source]
Bases:
object
Keep only necessary extra dimensions channels.
- add_dimensions(las_data: LasData)[source]
Add the dimensions that exist in self.extra_dimensions but not in las data
- get_extra_dims_as_str()[source]
‘stringify’ the extra_dims list and return it, or an empty list if there is no extra dims
utils
- class lidar_prod.tasks.utils.BDUniConnectionParams(host: str, user: str, pwd: str, bd_name: str)[source]
Bases:
object
URL and public credentials to connect to a database - typically the BDUni
- lidar_prod.tasks.utils.check_bbox_intersects_territoire_with_srid(bd_params: BDUniConnectionParams, bbox: Dict[str, int], epsg_srid: int | str)[source]
Check if a bounding box intersects one of the territories from the BDUni database (public.gcms_territoire) with the expected srid. As geometries are indicated with srid = 0 in the database (but stored in their original projection), both geometries are compared using this common srid. In the territoire geometry query, ST_Union is used to combine different territoires that would have the same srid (eg. 5490 for Guadeloupe and Martinique)
- lidar_prod.tasks.utils.get_a_las_to_las_pdal_pipeline(src_las_path: str, target_las_path: str, ops: Iterable[Any], epsg: int | str)[source]
Create a pdal pipeline, preserving format, forwarding every dimension.
- lidar_prod.tasks.utils.get_input_las_metadata(pipeline: Pipeline)[source]
Get las reader metadata from the input pipeline
- lidar_prod.tasks.utils.get_integer_bbox(pipeline: Pipeline, buffer: Number = 0) Dict[str, int] [source]
Get XY bounding box of the las input of a pipeline, cast x/y min/max to integers.
- Parameters:
- Returns:
x/y min/max values as a dictionary
- Return type:
- lidar_prod.tasks.utils.get_las_data_from_las(las_path: str, epsg: str | int | None = None) LasData [source]
Load las data from a las file
- lidar_prod.tasks.utils.get_pdal_reader(las_path: str, epsg: int | str) las [source]
Standard Reader which imposes Lamber 93 SRS.
- lidar_prod.tasks.utils.get_pdal_writer(target_las_path: str, reader_metadata={}, extra_dims: str = 'all') las [source]
Standard LAS Writer which imposes LAS 1.4 specification and dataformat 8.
- lidar_prod.tasks.utils.get_pipeline(input_value: Pipeline | str, epsg: int | str, las_metadata: dict | None = None)[source]
If the input value is a pipeline, returns it, if it’s a las path return the corresponding pipeline, If the input is a las_path, pipeline_metadata is updated to the new pipeline metadata
- Parameters:
input_value¶ (pdal.pipeline.Pipeline | str) – input value to get a pipeline from
pdal) ((las _sphinx_paramlinks_lidar_prod.tasks.utils.get_pipeline.pipeline or path to a file to read with) –
epsg¶ (int | str) – if input_value is a string, use the epsg value to override the crs from
header (the _sphinx_paramlinks_lidar_prod.tasks.utils.get_pipeline.las) –
las_metadata¶ (dict) – current pipeline metadata, used to propagate input metadata to the
las (application _sphinx_paramlinks_lidar_prod.tasks.utils.get_pipeline.output) –
- Returns:
pdal pipeline, updated pipeline_metadata dict
- lidar_prod.tasks.utils.pdal_read_las_array(las_path: str, epsg: str | int | None = None)[source]
Read LAS as a named array.
- lidar_prod.tasks.utils.request_bd_uni_for_building_shapefile(bd_params: BDUniConnectionParams, shapefile_path: str, bbox: Dict[str, int], epsg: int | str)[source]
Request BD Uni for its buildings.
Create a shapefile with non destructed building on the area of interest and saves it.
Also add a “PRESENCE” column filled with 1 for later use by pdal.
Note on the projections: Projections are mixed in the BDUni tables. In PostGIS, the declared projection is 0 but the data are stored in the legal projection of the corresponding territories. In each table, there is a a “gcms_territoire” field, which tells the corresponding territory (3 letters code). The gcms_territoire table gives hints on each territory (SRID, footprint)