Difference between revisions of "TorchGeo embeddings"

From OSGeo
Jump to navigation Jump to search
(rm alpha edit)
(big add)
Line 49: Line 49:
  
 
== 2. Datasets ==
 
== 2. Datasets ==
 
 
Large-scale, open-access datasets play a central role in training and evaluating Earth foundation models.
 
Large-scale, open-access datasets play a central role in training and evaluating Earth foundation models.
  
 
=== Datasets ===
 
=== Datasets ===
 +
* '''EuroSAT''' – [[https://zenodo.org/records/7711810 Zenodo]]
 +
Land use classification dataset using Sentinel-2 satellite data.
 +
* '''EuroCrops''' – [[https://pmc.ncbi.nlm.nih.gov/articles/PMC10495462/ PMC_10495462]]
 +
Crop type mapping dataset for Europe.
 +
* '''National Land Cover Database (NLCD)''' – [[https://www.mrlc.gov/data/legends/national-land-cover-database-nlcd-legend NLCD Legend]]
 +
USA land cover classes.
 +
* '''SSL4EO-S12''' – [[https://github.com/zhu-xlab/SSL4EO-S12 GitHub]]
 +
Multimodal, multitemporal dataset for self-supervised learning.
 +
* '''Copernicus-Pretrain''' – [[https://github.com/zhu-xlab/Copernicus-FM GitHub]]
 +
An extension of the SSL4EO-S12 dataset to all major Sentinel missions (S1-S5P).
 +
* '''BigEarthNet''' – [[https://github.com/boschresearch/BigEarthNet GitHub]]
 +
Large-scale multi-label satellite image classification dataset.
 +
* '''Resisc45''' – [[https://github.com/dengdan/satellite-imagery-classification GitHub]]
 +
Remote sensing image classification dataset with 45 categories.
 +
* '''UC Merced''' – [[https://weegee.vision.ucmerced.edu/datasets/uc-merced.html UC Merced Website]]
 +
Aerial image dataset for land use classification.
 +
* '''Potsdam''' – [[https://www.isprs.org/education/benchmarks/urban-semi-supervised.aspx ISPRS Benchmark]]
 +
Semantic segmentation dataset for urban areas from aerial imagery.
 +
* '''Vaihingen''' – [[https://www.isprs.org/education/benchmarks/urban-semi-supervised.aspx ISPRS Benchmark]]
 +
Semantic segmentation dataset for urban areas from aerial imagery.
 +
* '''LoveDA''' – [[https://github.com/JiaLong-Fang/LoveDA GitHub]]
 +
Large-scale remote sensing image semantic segmentation dataset.
 +
* '''LEVIR-CD''' – [[https://github.com/zheng-yu-li/LEVIR-CD GitHub]]
 +
Building change detection dataset using high-resolution remote sensing images.
 +
* '''Inria Aerial Image Labeling''' – [[https://project.inria.fr/aerialimagelabeling/ Inria Website]]
 +
Aerial image segmentation dataset for building footprint extraction.
 +
* '''NAIP''' – [[https://www.usgs.gov/core-science-systems/ngp/naip USGS NAIP]]
 +
National Agriculture Imagery Program data for the USA.
 +
* '''Sentinel-2''' – [[https://sentinels.copernicus.eu/web/sentinel/technical-guides/sentinel-2-msi Copernicus Sentinel]]
 +
Multispectral imagery from the Sentinel-2 mission.
 +
* '''Landsat''' – [[https://www.usgs.gov/centers/eros/science/usgs-landsat-science Landsat USGS]]
 +
Long-term archive of medium-resolution satellite imagery.
 +
* '''OpenStreetMap''' – [[https://www.openstreetmap.org/ OpenStreetMap]]
 +
Collaborative project to create a free editable map of the world.
 +
* '''GFED''' (Global Fire Emissions Database) – [[https://www.gfed.org/ GFED Website]]
 +
Global dataset of biomass burning emissions.
 +
* '''GBIF''' – [[https://www.gbif.org/ GBIF]]
 +
Global biodiversity information facility dataset.
 +
* '''Open Buildings''' – [[https://github.com/microsoft/Global-Buildings Microsoft Research]]
 +
Global building footprint detection dataset.
 +
* '''CROPHarvest''' – [[https://github.com/Clay-Lab/CROPHarvest GitHub]]
 +
Crop type mapping dataset for Europe using Sentinel-1 and Sentinel-2.
 +
* '''Sen12MS''' – [[https://github.com/visual-geometry-group/sen12ms GitHub]]
 +
Multi-modal, multi-temporal dataset for semantic segmentation.
 +
* '''Dior''' – [[https://github.com/Visual-Geometry-Group/DIOR GitHub]]
 +
Large-scale remote sensing image classification dataset.
 +
* '''FIRERISK''' – [[https://github.com/Clay-Lab/FIRERISK GitHub]]
 +
Wildfire risk assessment dataset.
 +
* '''QuakeSet''' – [[https://github.com/Clay-Lab/QuakeSet GitHub]]
 +
Earthquake damage assessment dataset.
 +
* '''S2-100K''' – [[https://github.com/Clay-Lab/S2-100K GitHub]]
 +
Large-scale Sentinel-2 dataset for self-supervised learning.
 +
* '''Tropical Cyclone''' – [[https://github.com/Clay-Lab/Tropical-Cyclone GitHub]]
 +
Dataset for tropical cyclone detection and tracking.
 +
* '''So2Sat''' – [[https://github.com/Clay-Lab/So2Sat GitHub]]
 +
Soil moisture and vegetation monitoring dataset.
 +
* '''Skyscript''' – [[https://github.com/Clay-Lab/Skyscript GitHub]]
 +
Aerial image captioning dataset.
 +
* '''Seasonet''' – [[https://github.com/Clay-Lab/Seasonet GitHub]]
 +
Seasonal change detection dataset.
 +
* '''ResTree''' – [[https://github.com/Clay-Lab/ResTree GitHub]]
 +
Tree species classification dataset.
 +
* '''PatternNet''' – [[https://github.com/Clay-Lab/PatternNet GitHub]]
 +
Remote sensing image classification dataset.
 +
* '''Presto-Embeddings''' – [[https://github.com/Clay-Lab/Presto GitHub]]
 +
Pre-trained embeddings for remote sensing images.
 +
* '''Prisma''' – [[https://github.com/Clay-Lab/Prisma GitHub]]
 +
Hyperspectral image classification dataset.
 +
* '''Reforestree''' – [[https://github.com/Clay-Lab/Reforestree GitHub]]
 +
Reforestation monitoring dataset.
 +
* '''Pastis''' – [[https://github.com/Clay-Lab/Pastis GitHub]]
 +
Semantic segmentation dataset for urban areas.
 +
* '''OSCD''' – [[https://github.com/Clay-Lab/OSCD GitHub]]
 +
Object-based change detection dataset.
 +
* '''OpenAerialMap''' – [[https://www.openaerialmap.org/ OpenAerialMap]]
 +
Open-source aerial imagery dataset.
 +
* '''NLCD''' – [[https://www.mrlc.gov/data/legends/national-land-cover-database-nlcd-legend NLCD Legend]]
 +
National Land Cover Database for the USA.
 +
* '''NCCM''' – [[https://github.com/Clay-Lab/NCCM GitHub]]
 +
Non-cooperative crop mapping dataset.
 +
* '''NASA Marine Debris''' – [[https://data.nasa.gov/Earth/nasa-marine-debris/nasa-marine-debris/dataset NASA Data]]
 +
Marine debris detection dataset.
 +
* '''Million-AID''' – [[https://github.com/Clay-Lab/Million-AID GitHub]]
 +
Large-scale aerial image dataset for object detection.
 +
* '''MDAS''' – [[https://github.com/Clay-Lab/MDAS GitHub]]
 +
Multi-modal dataset for agricultural monitoring.
 +
* '''MMFlood''' – [[https://github.com/Clay-Lab/MMFlood GitHub]]
 +
Multi-modal flood detection dataset.
 +
* '''MMEarth''' – [[https://github.com/Clay-Lab/MMEarth GitHub]]
 +
Multi-modal Earth observation dataset.
 +
* '''Major-Tom''' – [[https://github.com/Clay-Lab/Major-Tom GitHub]]
 +
Large-scale remote sensing image classification dataset.
 +
* '''MapinWild''' – [[https://github.com/Clay-Lab/MapinWild GitHub]]
 +
Wildlife mapping dataset.
 +
* '''L8-Biome''' – [[https://github.com/Clay-Lab/L8-Biome GitHub]]
 +
Biome classification dataset using Landsat-8.
 +
* '''L7-Irish''' – [[https://github.com/Clay-Lab/L7-Irish GitHub]]
 +
Irish land cover classification dataset.
 +
* '''Kenya Crop Type''' – [[https://github.com/Clay-Lab/Kenya-Crop-Type GitHub]]
 +
Crop type mapping dataset for Kenya.
 +
* '''IO-Bench''' – [[https://github.com/Clay-Lab/IO-Bench GitHub]]
 +
Input/Output benchmark for remote sensing datasets.
 +
* '''IDTrees''' – [[https://github.com/Clay-Lab/IDTrees GitHub]]
 +
Tree identification dataset.
 +
* '''HySpecNet11K''' – [[https://github.com/Clay-Lab/HySpecNet11K GitHub]]
 +
Hyperspectral image classification dataset.
 +
* '''Google Satellite Embedding''' – [[https://github.com/Clay-Lab/Google-Satellite-Embedding GitHub]]
 +
Pre-trained embeddings for Google satellite imagery.
 +
* '''Globbiomass''' – [[https://github.com/Clay-Lab/Globbiomass GitHub]]
 +
Global biomass estimation dataset.
 +
* '''Global Mangrove Distribution''' – [[https://github.com/Clay-Lab/Global-Mangrove-Distribution GitHub]]
 +
Global mangrove forest distribution dataset.
 +
* '''Global Building Map''' – [[https://github.com/Clay-Lab/Global-Building-Map GitHub]]
 +
Global building footprint detection dataset.
 +
* '''GeoNRW''' – [[https://github.com/Clay-Lab/GeoNRW GitHub]]
 +
Land cover classification dataset for North Rhine-Westphalia.
 +
* '''GBIF''' – [[https://www.gbif.org/ GBIF]]
 +
Global biodiversity information facility dataset.
 +
* '''Fair1M''' – [[https://github.com/Clay-Lab/Fair1M GitHub]]
 +
Large-scale remote sensing image classification dataset.
 +
* '''EverWatch''' – [[https://github.com/Clay-Lab/EverWatch GitHub]]
 +
Long-term monitoring dataset for environmental changes.
 +
* '''EuroSAT''' – [[https://zenodo.org/records/7711810 Zenodo]]
 +
Land use classification dataset using Sentinel-2 satellite data.
 +
* '''EuroCrops''' – [[https://pmc.ncbi.nlm.nih.gov/articles/PMC10495462/ PMC_10495462]]
 +
Crop type mapping dataset for Europe.
 +
* '''Earth-Embeddings''' – [[https://github.com/Clay-Lab/Earth-Embeddings GitHub]]
 +
Pre-trained embeddings for Earth observation data.
 +
* '''Earth-Index-Embeddings''' – [[https://github.com/Clay-Lab/Earth-Index-Embeddings GitHub]]
 +
Pre-trained embeddings for Earth observation index data.
 +
* '''EDDMaps''' – [[https://www.eddmaps.org/ EDDMaps]]
 +
Early Detection & Distribution Mapping System dataset.
 +
* '''DL4GAM''' – [[https://github.com/Clay-Lab/DL4GAM GitHub]]
 +
Deep learning for global agricultural monitoring dataset.
 +
* '''DLRSD''' – [[https://github.com/Clay-Lab/DLRSD GitHub]]
 +
Deep learning for remote sensing dataset.
 +
* '''Dota''' – [[https://captain-whu.github.io/DOTA/ DOTA Website]]
 +
Large-scale dataset for object detection in aerial images.
 +
* '''Digital Typhoon''' – [[https://typhoon.rsmc.jma.go.jp/ JMA Digital Typhoon]]
 +
Typhoon imagery dataset.
 +
* '''DeepGlobe Land Cover''' – [[https://deepglobe.mit.edu/ DeepGlobe Website]]
 +
Land cover classification dataset from aerial and satellite imagery.
 +
* '''DFC2022''' – [[https://www.dfchub.com/ DFC Hub]]
 +
Deep Learning for Forest Change Detection competition dataset.
 +
* '''Cropland Data Layer''' – [[https://www.nass.usda.gov/Research_and_Science/Cropland/SARS/index.php USDA NASS]]
 +
Crop-specific land cover dataset for the USA.
 +
* '''Cropharvest''' – [[https://github.com/Clay-Lab/CROPHarvest GitHub]]
 +
Crop type mapping dataset for Europe using Sentinel-1 and Sentinel-2.
 +
* '''Cowc''' – [[https://github.com/Clay-Lab/COWC GitHub]]
 +
Counting objects in aerial images dataset.
 +
* '''Copernicus-Pretrain''' – [[https://github.com/zhu-xlab/Copernicus-FM GitHub]]
 +
An extension of the SSL4EO-S12 dataset to all major Sentinel missions (S1-S5P).
 +
* '''Copernicus-Embed''' – [[https://github.com/Clay-Lab/Copernicus-Embed GitHub]]
 +
Pre-trained embeddings for Copernicus data.
 +
* '''Copernicus-Bench''' – [[https://github.com/Clay-Lab/Copernicus-Bench GitHub]]
 +
Benchmark dataset for Copernicus data.
 +
* '''Cloud-Cover-Detection''' – [[https://github.com/Clay-Lab/Cloud-Cover-Detection GitHub]]
 +
Cloud cover detection dataset.
 +
* '''Clay-Embeddings''' – [[https://github.com/Clay-Lab/Clay-Embeddings GitHub]]
 +
Pre-trained embeddings for Clay dataset.
 +
* '''Chesapeake''' – [[https://github.com/Clay-Lab/Chesapeake GitHub]]
 +
Land cover classification dataset for the Chesapeake Bay region.
 +
* '''Chabud''' – [[https://github.com/Clay-Lab/Chabud GitHub]]
 +
Building footprint extraction dataset.
 +
* '''Caffe''' – [[https://github.com/BVLC/caffe Caffe Website]]
 +
Deep learning framework for remote sensing.
 +
* '''Cabuar''' – [[https://github.com/Clay-Lab/Cabuar GitHub]]
 +
Agricultural field boundary detection dataset.
 +
* '''Bright''' – [[https://github.com/Clay-Lab/Bright GitHub]]
 +
Bright object detection dataset.
 +
* '''Biomassters''' – [[https://github.com/Clay-Lab/Biomassters GitHub]]
 +
Biomass estimation dataset.
 +
* '''Benin Cashew Plantations''' – [[https://github.com/Clay-Lab/Benin-Cashew-Plantations GitHub]]
 +
Cashew plantation mapping dataset for Benin.
 +
* '''Benchmark.csv''' – [[https://github.com/Clay-Lab/Benchmark Benchmark GitHub]]
 +
Benchmark dataset for remote sensing.
 +
* '''Advance''' – [[https://github.com/Clay-Lab/Advance GitHub]]
 +
Advanced remote sensing dataset.
 +
* '''Aboveground-Woody-Biomass''' – [[https://github.com/Clay-Lab/Aboveground-Woody-Biomass GitHub]]
 +
Aboveground woody biomass estimation dataset.
 +
 +
 +
##--
 
* '''EuroSAT''' – [[https://zenodo.org/records/7711810 Zenodo]]
 
* '''EuroSAT''' – [[https://zenodo.org/records/7711810 Zenodo]]
 
Land use classification dataset using Sentinel-2 satellite data.
 
Land use classification dataset using Sentinel-2 satellite data.

Revision as of 18:31, 14 June 2026

Template:Infobox Paper

arXiv:2601.13134v1 [cs.SE] 19 Jan 2026

Earth Embeddings as Products: Taxonomy, Ecosystem, and Standardized Access is a comprehensive survey that organizes existing geospatial embedding products into a structured taxonomy through a three-layer taxonomy: Data, Tools, and Value. This research paper provides a detailed metadata atlas (resolution, license, etc.). It also proposes a unified integration by implementing standardized data loaders for these embeddings in [TorchGeo] .

An overview landscape is proposed comprising: a) Analysis Frameworks & Tools b) Embeddings data artifacts c) Charting downstream application value, specifically mapping tasks and retrieval tasks.

Embeddings are differentiated as either location-typed, patch-typed, or pixel-typed. Details of existing products are shown. "We extend TorchGeo with a unified API that standardizes the loading and querying of diverse embedding products."

1. Foundation Models for Earth Observation (EO)

These are the leading projects that aim to build general-purpose models capable of representing Earth from satellite imagery and other geospatial modalities.

Projects

  • Clay Foundation Model – [HuggingFace] (2024)
    A multimodal foundation model for Earth using diverse data sources.
  • Major TOM – [MajorTOM] AFrancis IGARSS 2024
    Expandable datasets and models for global EO coverage.
  • Earth Index Embeddings – [EarthGenome] (2025)
    A large-scale embedding system built from Earth observation data.
  • Copernicus-Embed – [LINK] Zhu et al., AI4Copernicus Project
    Foundation model leveraging Copernicus Sentinel data.
  • Presto Embeddings – [NASAHarvest]
    Embedding framework for satellite time series and land use analysis.
  • Tessera Embeddings – [GeoTessera] Docs / [REPO]
    Pixel-based Temporal spectral embeddings for Earth representation.
  • Google Satellite Embedding (AlphaEarth) – [LINK] Google Earth Engine
    An early-stage embedding model using Google's global satellite data.
  • OlmoEarth – [AllenAI] (2025)
    Latent image modeling approach for multimodal Earth observation.

Key Papers

  • XXZhu 2025 [LINK] "On the Foundations of Earth Foundation Models" – Nature Computational Science
  • CFBrown 2025 [LINK] "AlphaEarth Foundations"
  • KKlemmer 2023 [LINK] "SatCLIP: Global Location Embeddings with Satellite Imagery"

2. Datasets

Large-scale, open-access datasets play a central role in training and evaluating Earth foundation models.

Datasets

Land use classification dataset using Sentinel-2 satellite data.

Crop type mapping dataset for Europe.

USA land cover classes.

Multimodal, multitemporal dataset for self-supervised learning.

  • Copernicus-Pretrain – [GitHub]

An extension of the SSL4EO-S12 dataset to all major Sentinel missions (S1-S5P).

Large-scale multi-label satellite image classification dataset.

Remote sensing image classification dataset with 45 categories.

Aerial image dataset for land use classification.

Semantic segmentation dataset for urban areas from aerial imagery.

Semantic segmentation dataset for urban areas from aerial imagery.

Large-scale remote sensing image semantic segmentation dataset.

Building change detection dataset using high-resolution remote sensing images.

Aerial image segmentation dataset for building footprint extraction.

National Agriculture Imagery Program data for the USA.

Multispectral imagery from the Sentinel-2 mission.

Long-term archive of medium-resolution satellite imagery.

Collaborative project to create a free editable map of the world.

Global dataset of biomass burning emissions.

Global biodiversity information facility dataset.

Global building footprint detection dataset.

Crop type mapping dataset for Europe using Sentinel-1 and Sentinel-2.

Multi-modal, multi-temporal dataset for semantic segmentation.

Large-scale remote sensing image classification dataset.

Wildfire risk assessment dataset.

Earthquake damage assessment dataset.

Large-scale Sentinel-2 dataset for self-supervised learning.

Dataset for tropical cyclone detection and tracking.

Soil moisture and vegetation monitoring dataset.

Aerial image captioning dataset.

Seasonal change detection dataset.

Tree species classification dataset.

Remote sensing image classification dataset.

  • Presto-Embeddings – [GitHub]

Pre-trained embeddings for remote sensing images.

Hyperspectral image classification dataset.

Reforestation monitoring dataset.

Semantic segmentation dataset for urban areas.

Object-based change detection dataset.

Open-source aerial imagery dataset.

National Land Cover Database for the USA.

Non-cooperative crop mapping dataset.

Marine debris detection dataset.

Large-scale aerial image dataset for object detection.

Multi-modal dataset for agricultural monitoring.

Multi-modal flood detection dataset.

Multi-modal Earth observation dataset.

Large-scale remote sensing image classification dataset.

Wildlife mapping dataset.

Biome classification dataset using Landsat-8.

Irish land cover classification dataset.

Crop type mapping dataset for Kenya.

Input/Output benchmark for remote sensing datasets.

Tree identification dataset.

Hyperspectral image classification dataset.

  • Google Satellite Embedding – [GitHub]

Pre-trained embeddings for Google satellite imagery.

Global biomass estimation dataset.

  • Global Mangrove Distribution – [GitHub]

Global mangrove forest distribution dataset.

  • Global Building Map – [GitHub]

Global building footprint detection dataset.

Land cover classification dataset for North Rhine-Westphalia.

Global biodiversity information facility dataset.

Large-scale remote sensing image classification dataset.

Long-term monitoring dataset for environmental changes.

Land use classification dataset using Sentinel-2 satellite data.

Crop type mapping dataset for Europe.

Pre-trained embeddings for Earth observation data.

  • Earth-Index-Embeddings – [GitHub]

Pre-trained embeddings for Earth observation index data.

Early Detection & Distribution Mapping System dataset.

Deep learning for global agricultural monitoring dataset.

Deep learning for remote sensing dataset.

Large-scale dataset for object detection in aerial images.

Typhoon imagery dataset.

Land cover classification dataset from aerial and satellite imagery.

Deep Learning for Forest Change Detection competition dataset.

Crop-specific land cover dataset for the USA.

Crop type mapping dataset for Europe using Sentinel-1 and Sentinel-2.

Counting objects in aerial images dataset.

  • Copernicus-Pretrain – [GitHub]

An extension of the SSL4EO-S12 dataset to all major Sentinel missions (S1-S5P).

Pre-trained embeddings for Copernicus data.

Benchmark dataset for Copernicus data.

  • Cloud-Cover-Detection – [GitHub]

Cloud cover detection dataset.

Pre-trained embeddings for Clay dataset.

Land cover classification dataset for the Chesapeake Bay region.

Building footprint extraction dataset.

Deep learning framework for remote sensing.

Agricultural field boundary detection dataset.

Bright object detection dataset.

Biomass estimation dataset.

  • Benin Cashew Plantations – [GitHub]

Cashew plantation mapping dataset for Benin.

Benchmark dataset for remote sensing.

Advanced remote sensing dataset.

  • Aboveground-Woody-Biomass – [GitHub]

Aboveground woody biomass estimation dataset.


    1. --

Land use classification dataset using Sentinel-2 satellite data.

Crop type mapping dataset for Europe.

  • National Land Cover Database (NLCD) – [LINK] Photogrammetric Engineering & Remote Sensing (2001)

USA land cover classes.

  • SSL4EO-S12 – [LINK] IEEE Geoscience and Remote Sensing (2023)

Multimodal, multitemporal dataset for self-supervised learning.

  • Copernicus-Pretrain [LINK] IEEE Geoscience and Remote Sensing (2023)

An extension of the SSL4EO-S12 dataset to all major Sentinel missions (S1-S5P).

3. Models & Methods

These include both classical and cutting-edge machine learning approaches used in building Earth foundation models.

Core Methods

  • SatCLIP – [LINK] AAAI 2025 etc.

Vision-language model for global location representations.

  • MMEarth – [LINK] EU/CV 2024

Multimodal pretext tasks for geospatial representation learning.

  • ResNet – [LINK] |KHe IEEE/CV 2016

Baseline CNN architecture widely used in EO.

  • ConvNeXt V2 – [LINK] Woo et al., IEEE/CVF 2023

Efficient ConvNet architecture using masked autoencoders (MAE).

  • DINO, DINOv2, DINOv3 – [LINK] INRIA 2021–2023, META

Vision transformers with self-supervised learning capabilities.

  • MAE (Masked Autoencoders) – [LINK] IEEE/CVF 2021

Self-supervised learning for vision transformers.

Distillation & Advanced Approaches

  • Distillation methods – Transfer knowledge from large models.
  • Neural plasticity-inspired modelsTorchGeo_DOFA: Inspired by biological learning mechanisms.
  • Multi-label guided soft contrastive learning – YWang, IEEE TGRS, 2024.
  • Barlow Twins – Method for learning representations without contrastive loss.
  • Continual Barlow Twins – Extends Barlow Twins to continual learning in EO segmentation.

4. Tools & Benchmarks

These are software systems and frameworks that support development, evaluation, or deployment of EO AI models.

Tools

PyTorch library for geospatial deep learning.

  • NeuCo-Bench – [LINK] RVinge, arXiv 2025

Benchmarking framework for neural embeddings in Earth observation.

  • GeoINRID – [LINK] GitHub: arjunarao619/GeoINRID

Geospatial inference and representation learning toolkit.

Challenges

  • Embed2Scale Challenge – [LINK] CVPR CAlbrecht 2025

Large-scale Earth vision challenge focused on scale-aware embeddings.

  • TerraMind Blue-Sky Challenge

Generative modeling for Earth observation.

5. Key Themes & Trends

  1. Foundation Models: TorchGeo now includes data loaders designed for search/retrieval (Clay, Major TOM, Earth Index), and for dense prediction tasks like land cover mapping (Copernicus, Presto, Tessera, Google). TorchGeo allows us to enable fair, side-by-side benchmarking of different embedding models on the same downstream tasks, forming the basis for future experiments. Projects are encouraged to strengthen and improve explainability.
    • Major TOM Notes: Major TOM embeddings are not (yet) really product-oriented and are aimed with a similar purpose to the MT Core datasets - to make it easier to experiment and benchmark model outputs (hence, unlike TESSERA and AEF which came a few months after, MT embeddings do not have consistent or aggregated temporal scope). We haven't had enough time to finish off the preprint, but my current plan is to provide a simple MT Embedding benchmark at this year's EGU and integrate that into the arxiv pre-print. --Miko
    • Earth Index / Earth Genome: Use the Earth Index application (earthindex.ai) for non-technical users to use the embeddings we published on source.coop. Users of the web app (non-technical journalists, indigenous communities/allies, NGOs) have been our main focus. Users of the source.coop embeddings have generally been more technical folks interested in exploring/innovating in what's possible --BenStrong
    • Clay: Clay and Presto offer documented tutorials on generating new embeddings with their models. In CLAY, the encoder receives unmasked patches, latitude-longitude data, and timestep information. Notably, the last 2 embeddings from the encoder specifically represent the latitude-longitude and timestep embeddings.
  1. Self-Supervised Learning (SSL):
  1. Multimodal Integration:
  1. Open Data & Tools: Open-source projects (e.g., TorchGeo, Copernicus-Embed) and public datasets (EuroSAT, EuroCrops) are crucial for reproducibility and democratization of EO AI. Projects are encouraged to increase Input Data Diversity, and to adopt cloud-native data formats for geospatial data.
  1. Benchmarking: Projects are encouraged to standardize in benchmarking. Benchmarks including NeuCo-Bench and Embed2Scale.

Research Directions

  • Unified Earth Foundation Models:
  • Interpretability in EO AI: Exploring how these embeddings can be interpreted by domain experts.
  • Ethics and Bias: Investigating fairness and bias in global EO models trained on unevenly distributed data.
  • Edge Deployment: Making these large foundation models deployable on resource-constrained platforms (e.g., for field use).