Artificial Intelligence challenges organised around geo-data and deep learning

Project maintained by IGNF

🇫🇷 Version française

Welcome to IGN's FLAIR datasets page!

The French National Institute of Geographical and Forest Information (IGN) presents its AI challenges and benchmark datasets FLAIR (for French Land cover from Aerospace ImageRy). The FLAIR datasets include Earth Observation data from different aerospace sensors. These datasets cover large scales and reflect real world cases of land cover mapping tasks.

We are committed to supporting research and fostering innovation in the fields of Earth Observation. For any question concerning the data, their access and exploitation, as well as for any idea of future datasets or suggestion of topics, simply contact us at the address: flair@ign.fr
The FLAIR datasets are under the Open Licence 2.0 of Etalab.
Remember to cite the associated datapaper to each dataset.

FLAIR #1 : semantic segmentation and domain adaptation 🌍🌱🏠🌳➡️🛩️

Challenge organized by IGN with the support of the SFPT.
The challenge took place on Codalab from November, 21st 2022 to March, 21st 2023. See the results here.

FLAIR #1 datapaper 📑 : https://arxiv.org/pdf/2211.12979.pdf
FLAIR #1 repository 📁 : https://github.com/IGNF/FLAIR-1-AI-Challenge

Pre-trained models ⚡ : https://huggingface.co/collections/IGNF/flair-models-landcover-semantic-segmentation-65bb67415a5dbabc819a95de

▶️ Dataset description (click to expand)

We present here a large dataset ( >20 billion pixels) of aerial imagery, topographic information and land cover (buildings, water, forest, agriculture...) annotations with the aim to further advance research on semantic segmentation , domain adaptation and transfer learning. Countrywide remote sensing aerial imagery is by necessity acquired at different times and dates and under different conditions. Likewise, at large scales, the characteristics of semantic classes can vary depending on location and become heterogenous. This opens up challenges for the spatial and temporal generalization of deep learning models!

The FLAIR-one dataset consists of 77,412 high resolution patches (512x512 at 0.2 m spatial resolution) with 19 semantic classes. For this challenge and the associated baselines, due to imbalanced class frequencies, the number of classes has been reduced to 13 (remapping >12 to 13, see the datapaper for explanation).

ClassValueFreq.-train (%)Freq.-test (%)
pervious surface28.257.34
impervious surface313.7214.98
bare soil43.474.36
herbaceous vegetation1017.8422.17
agricultural land1110.986.95
plowed land123.882.25
swimming pool130.030.04
clear cut150.150.01

The dataset covers a total of approximatly 812 km², with patches that have been sampled accross the entire metropolitan French territory to be illustrating the different climate and landscapes (spatial domains). The aerial images included in the dataset were acquired during different months and years (temporal domains).

Aerial image ORTHO HR Labels
Aerial image ORHTO HR® Labels

The test dataset consists of 15,700 patches from 10 domains not included in the train dataset. Class frequency and temporal domains of the test dataset includes a shift from the train dataset allowing to assess the domain adaptation capabilities of developped approaches.

▶️ Baseline model: U-Net (click to expand)

A U-Net architecture with a pre-trained ResNet34 encoder from the pytorch segmentation models library has been used for the baselines. The used architecture allows integration of patch-wise metadata information and employs commonly used image data augmentation techniques. Codes are available in the FLAIR #1 repository.

▶️ Dowload the dataset (click to expand)
Aerial images - train50.7 Go.zip download
Aerial images - test13.4 Go.zip download
Labels - train485 Mo.zip download
Labels - test124 Mo.zip download
Aerial metadata16.1 Mo.json download
Areas shapes392 Ko.gpkg download
Toy dataset (subset of train and test)215 Mo.zip download


Please include a citation to the following paper if you use the FLAIR #1 dataset:

Plain text:

Anatol Garioud, Stéphane Peillet, Eva Bookjans, Sébastien Giordano, and Boris Wattrelos. 2022. 
FLAIR #1: semantic segmentation and domain adaptation dataset. (2022). 


  doi = {10.13140/RG.2.2.30183.73128/1},
  url = {https://arxiv.org/pdf/2211.12979.pdf},
  author = {Garioud, Anatol and Peillet, Stéphane and Bookjans, Eva and Giordano, Sébastien and Wattrelos, Boris},
  title = {FLAIR #1: semantic segmentation and domain adaptation dataset},
  publisher = {arXiv},
  year = {2022}

FLAIR #2 : textural and temporal information for semantic segmentation from multi-source optical imagery 🌍🌱🏠🌳➡️🛩️🛰️

Challenge organized by IGN with the support of the CNES and Connect by CNES with the Copernicus / FPCUP projetc.

FLAIR #2 datapaper 📑 : https://arxiv.org/pdf/2305.14467.pdf
FLAIR #2 NeurIPS datapaper 📑 : https://proceedings.neurips.cc/paper_files/paper/2023/file/353ca88f722cdd0c481b999428ae113a-Paper-Datasets_and_Benchmarks.pdf
FLAIR #2 NeurIPS poster 📑 : https://neurips.cc/media/PosterPDFs/NeurIPS%202023/73621.png?t=1699528363.252194
FLAIR #2 repository 📁 : https://github.com/IGNF/FLAIR-2-AI-Challenge
FLAIR #2 challenge page 💻 : https://codalab.lisn.upsaclay.fr/competitions/13447 [now closed]

Pre-trained models ⚡ : for now upon request !

▶️ Context of the challenge (click to expand)

With this new challenge, participants will be tasked with developing innovative solutions that can effectively harness the textural information from single date aerial imagery and temporal/spectral information from Sentinel-2 satellite time series to enhance semantic segmentation, domain adaptation, and transfer learning. Your solutions should address the challenges of reconciling differing acquisition times, spatial resolutions, accommodating varying conditions, and handling the heterogeneity of semantic classes across different locations.

▶️ Dataset description (click to expand)

The FLAIR #2 dataset encompasses 20,384,841,728 annotated pixels at a spatial resolution of 0.20 m from aerial imagery, divided into 77,762 patches of size 512x512. The FLAIR #2 dataset also includes an extensive collection of satellite data, with a total of 51,244 acquisitions of Copernicus Sentinel-2 satellite images. For each area, a comprehensive one-year record of acquisitions has been gathered offering valuable insights into the spatio-temporal dynamics and spectral characteristics of the land cover. Due to the significant difference in spatial resolution between aerial imagery and satellite data, the areas initially defined lack sufficient context as they consist of only a few Sentinel-2 pixels. To address this, a buffer was applied to create larger areas known as super-areas. This ensures that each patch of the dataset is associated with a sufficiently sized super-patch of Sentinel-2 data, providing a minimum level of context from the satellite.

The dataset covers 50 spatial domains, encompassing 916 areas spanning 817 km². With 13 semantic classes (plus 6 not used in this challenge), this dataset provides a robust foundation for advancing land cover mapping techniques.

ClassValueFreq.-train (%)Freq.-test (%)
pervious surface28.253.82
impervious surface313.725.87
bare soil43.471.6
herbaceous vegetation1017.8419.76
agricultural land1110.9818.19
plowed land123.881.81
swimming pool130.010.02
clear cut150.150.82

▶️ Baseline model: U-T&T (click to expand)

We propose the U-T&T model, a two-branch architecture that combines spatial and temporal information from very high-resolution aerial images and high-resolution satellite images into a single output. The U-Net architecture is employed for the spatial/texture branch, using a ResNet34 backbone model pre-trained on ImageNet. For the spatio-temporal branch, the U-TAE architecture incorporates a Temporal self-Attention Encoder (TAE) to explore the spatial and temporal characteristics of the Sentinel-2 time series data, applying attention masks at different resolutions during decoding. This model allows for the fusion of learned information from both sources, enhancing the representation of mono-date and time series data.

▶️ Download the dataset (click to expand)

Aerial images - train50.7 Go.zip download
Aerial images - test13.4 Go.zip download
Sentinel-2 images - train22.8 Go.zip download
Sentinel-2 images - test6 Go.zip download
Labels - train485 Mo.zip download
Labels - test108 Mo.zip download
Aerial metadata16.1 Mo.json download
Aerial <-> Sentinel-2 matching dict16.1 Mo.json download
Satellite shapes392 Ko.gpkg download
Toy dataset (subset of train and test)1.6 Go.zip download

Alternatively, get the dataset from our HuggingFace Page.


Please include a citation to the following paper if you use the FLAIR #2 dataset:

Plain text:

Anatol Garioud, Nicolas Gonthier, Loic Landrieu, Apolline De Wit, Marion Valette, Marc Poupée, Sébastien Giordano and Boris Wattrelos. 2023. 
FLAIR: a Country-Scale Land Cover Semantic Segmentation Dataset From Multi-Source Optical Imagery. (2023).
In proceedings of Advances in Neural Information Processing Systems (NeurIPS) 2023.
DOI: https://doi.org/10.48550/arXiv.2310.13336


      title={FLAIR: a Country-Scale Land Cover Semantic Segmentation Dataset From Multi-Source Optical Imagery}, 
      author={Anatol Garioud and Nicolas Gonthier and Loic Landrieu and Apolline De Wit and Marion Valette and Marc Poupée and Sébastien Giordano and Boris Wattrelos},
      booktitle={Advances in Neural Information Processing Systems (NeurIPS) 2023},