Difference between revisions of "TorchGeo embeddings"

From OSGeo
Jump to navigation Jump to search
 
(35 intermediate revisions by the same user not shown)
Line 9: Line 9:
 
}}
 
}}
  
arXiv:2601.13134v1 [cs.SE] 19 Jan 2026
+
[[https://arxiv.org/html/2601.13134v1 arXiv:2601.13134v1]] [cs.SE] 19 Jan 2026
  
 
'''Earth Embeddings as Products: Taxonomy, Ecosystem, and Standardized Access''' is a comprehensive survey that organizes existing geospatial embedding products into a structured taxonomy through a three-layer taxonomy:
 
'''Earth Embeddings as Products: Taxonomy, Ecosystem, and Standardized Access''' is a comprehensive survey that organizes existing geospatial embedding products into a structured taxonomy through a three-layer taxonomy:
Line 26: Line 26:
  
 
=== Projects ===
 
=== Projects ===
 +
* '''OlmoEarth''' – [[https://allenai.org/olmoearth AllenAI]] (2025)
 +
*: Latent image modeling approach for multimodal Earth observation.
 +
* '''Major TOM''' – [[https://huggingface.co/Major-TOM MajorTOM]] AFrancis IGARSS 2024
 +
*:  (Terrestrial Observation Metaset) is an open-source framework and massive satellite dataset created by the European Space Agency's Φ-lab.
 
* '''Clay Foundation Model''' – [[https://huggingface.co/made-with-clay/Clay HuggingFace]] (2024)
 
* '''Clay Foundation Model''' – [[https://huggingface.co/made-with-clay/Clay HuggingFace]] (2024)
 
*: A multimodal foundation model for Earth using diverse data sources.
 
*: A multimodal foundation model for Earth using diverse data sources.
* '''Major TOM''' – [[https://huggingface.co/Major-TOM MajorTOM]] AFrancis IGARSS 2024
 
*: Expandable datasets and models for global EO coverage.
 
 
* '''Earth Index Embeddings''' – [[https://www.earthgenome.org/earth-index EarthGenome]] (2025)
 
* '''Earth Index Embeddings''' – [[https://www.earthgenome.org/earth-index EarthGenome]] (2025)
 
*: A large-scale embedding system built from Earth observation data.
 
*: A large-scale embedding system built from Earth observation data.
Line 40: Line 42:
 
* '''Google Satellite Embedding (AlphaEarth)''' – [[https://developers.google.com/earth-engine/datasets/catalog/GOOGLE_SATELLITE_EMBEDDING_V1_ANNUAL LINK]] Google Earth Engine
 
* '''Google Satellite Embedding (AlphaEarth)''' – [[https://developers.google.com/earth-engine/datasets/catalog/GOOGLE_SATELLITE_EMBEDDING_V1_ANNUAL LINK]] Google Earth Engine
 
*: An early-stage embedding model using Google's global satellite data.
 
*: An early-stage embedding model using Google's global satellite data.
* '''OlmoEarth''' – [[https://allenai.org/olmoearth AllenAI]] (2025)
 
*: Latent image modeling approach for multimodal Earth observation.
 
  
 
=== Key Papers ===
 
=== Key Papers ===
Line 49: Line 49:
  
 
== 2. Datasets ==
 
== 2. Datasets ==
 +
Large-scale, open-access datasets play a central role in training and evaluating Earth foundation models.  See the [[TorchGeo_embeddings#Deatasets_Auto-Edit|list]] below for a representative sample.
  
Large-scale, open-access datasets play a central role in training and evaluating Earth foundation models.
+
== 3. Model Architectures for Embeddings ==
  
=== Datasets ===
+
The generation of robust Earth observation embeddings relies on specific model families pre-trained on large-scale geospatial data. The following architectures are central to the TorchGeo ecosystem and recent foundation model research, categorized by their primary learning paradigm.
* '''EuroSAT''' – [[https://zenodo.org/records/7711810 Zenodo]]
+
 
Land use classification dataset using Sentinel-2 satellite data.
+
=== Vision-Language Models ===
* '''EuroCrops''' – [[https://pmc.ncbi.nlm.nih.gov/articles/PMC10495462/ PMC_10495462]]
+
These models align visual satellite imagery with textual or geographic metadata, creating unified embedding spaces for cross-modal retrieval.
Crop type mapping dataset for Europe.
+
 
* '''National Land Cover Database (NLCD)''' – [[https://www.ingentaconnect.com/content/asprs/pers/2004/00000070/00000007/art00005;jsessionid=2awgc0agiboi4.x-ic-live-02 LINK]] Photogrammetric Engineering & Remote Sensing (2001)
+
* '''SatCLIP''' – [[https://arxiv.org/abs/2311.17179v2 [LINK]]] (AAAI 2025)
USA land cover classes.
+
A vision-language model designed to generate global location representations by aligning satellite imagery with geographic text embeddings, originally published in 2023 and widely studied.
* '''SSL4EO-S12''' – [[https://github.com/zhu-xlab/SSL4EO-S12 LINK]] IEEE Geoscience and Remote Sensing (2023)
+
 
Multimodal, multitemporal dataset for self-supervised learning.
+
* '''MMEarth''' – [[https://vishalned.github.io/mmearth/ SITE]] [[https://arxiv.org/abs/2405.02771v2 [arXiv]]] [[https://github.com/vishalned/MMEarth-data Datasets]] (ECCV 2024)
* '''Copernicus-Pretrain''' [[https://github.com/zhu-xlab/Copernicus-FM LINK]] IEEE Geoscience and Remote Sensing (2023)
+
A global dataset for multi-modal and geospatial representation learning. It consists of 12 modalities including pixel-level and image-level modalities from 1.2 million locations.
An extension of the SSL4EO-S12 dataset to all major Sentinel missions (S1-S5P).
+
 
 +
=== Self-Supervised Vision Transformers ===
 +
Transformers pre-trained via self-supervised objectives (e.g., DINO, MAE) are currently the state-of-the-art for extracting rich, transferable embeddings from remote sensing imagery.
  
== 3. Models & Methods ==
+
* '''DINO / DINOv2 / DINOv3''' – [[https://ai.meta.com/research/dinov3/ [LINK]]] (Meta/INRIA, 2021–2023)
 +
Vision transformers trained with self-distillation without labels. DINOv2, in particular, produces highly discriminative embeddings that capture fine-grained geospatial features without downstream fine-tuning.
 +
* '''MAE (Masked Autoencoders)''' – [[https://arxiv.org/abs/2111.06377 [LINK]]] (CVPR 2021)
 +
A self-supervised encoder that learns robust image representations by reconstructing masked patches. When applied to satellite data, MAE embeddings capture global context and local texture effectively.
  
These include both classical and cutting-edge machine learning approaches used in building Earth foundation models.
+
=== Efficient Convolutional Backbones ===
 +
While Transformers dominate embedding research, optimized CNNs remain critical for efficient embedding extraction in resource-constrained environments or as feature extractors for hybrid models.
  
=== Core Methods ===
+
* '''ConvNeXt V2''' – [[https://arxiv.org/abs/2301.00808 [LINK]]] (Woo et al., CVPR 2023)
* '''SatCLIP''' – [[https://arxiv.org/abs/2311.17179|AAAI 2025]] etc.
+
A modernized ConvNet that incorporates Transformer-like design principles (e.g., large kernel sizes, layer normalization). It serves as a high-performance, efficient backbone for generating spatial embeddings in TorchGeo.
*: Vision-language model for global location representations.
+
* '''ResNet''' – [[https://arxiv.org/abs/1512.03385 [LINK]]] (He et al., CVPR 2016)
* '''MMEarth''' – [[https://arxiv.org/abs/2405.02771|EU/CV 2024]]
+
The foundational residual network. While older, ResNet variants remain the standard baseline for comparing embedding quality in early-stage Earth Foundation Model development.
*: Multimodal pretext tasks for geospatial representation learning.
 
* '''ResNet''' – [[https://viso.ai/deep-learning/resnet-residual-neural-network/|KHe IEEE/CV 2016]]
 
*: Baseline CNN architecture widely used in EO.
 
* '''ConvNeXt V2''' – [[https://github.com/facebookresearch/ConvNeXt-V2|Woo et al., IEEE/CVF 2023]]
 
*: Efficient ConvNet architecture using masked autoencoders (MAE).
 
* '''DINO, DINOv2, DINOv3''' – [[https://dinov3.org/|INRIA 2021–2023, META]]
 
*: Vision transformers with self-supervised learning capabilities.
 
* '''MAE (Masked Autoencoders)''' – [[https://arxiv.org/abs/2111.06377|IEEE/CVF 2021]]
 
*: Self-supervised learning for vision transformers.
 
  
 
=== Distillation & Advanced Approaches ===
 
=== Distillation & Advanced Approaches ===
 
* '''Distillation methods''' – Transfer knowledge from large models.
 
* '''Distillation methods''' – Transfer knowledge from large models.
* '''Neural plasticity-inspired models''' – [[https://arxiv.org/|ZXiong, arXiv 2024]]: Inspired by biological learning mechanisms.
+
* '''Neural plasticity-inspired models''' – [[TorchGeo_DOFA]]: Inspired by biological learning mechanisms.
* '''Multi-label guided soft contrastive learning''' – [[https://ieeexplore.ieee.org/|YWang, IEEE TGRS, 2024]].
+
* '''Multi-label guided soft contrastive learning''' – YWang, IEEE TGRS, 2024.
* '''Barlow Twins''' – [[https://arxiv.org/|Zbontar et al., arXiv 2021]]: Method for learning representations without contrastive loss.
+
* '''Barlow Twins''' – Method for learning representations without contrastive loss.
* '''Continual Barlow Twins''' – [[https://ieeexplore.ieee.org/|IEEE JSTARS, 2023]]: Extends Barlow Twins to continual learning in EO segmentation.
+
* '''Continual Barlow Twins''' – Extends Barlow Twins to continual learning in EO segmentation.
  
 
== 4. Tools & Benchmarks ==
 
== 4. Tools & Benchmarks ==
Line 94: Line 92:
  
 
=== Tools ===
 
=== Tools ===
* '''TorchGeo''' – [[https://www.osgeo.org/projects/torchgeo/|AJStewart ACM 2025]]
+
* '''TorchGeo''' – [[https://www.osgeo.org/projects/torchgeo/ TorchGeo]] [[https://dl.acm.org/doi/10.1145/3707459 ACM]] AJStewart et.al.
*: PyTorch library for geospatial deep learning.
+
PyTorch library for geospatial deep learning.
* '''NeuCo-Bench''' – [[https://arxiv.org/html/2510.17914v1|RVinge, arXiv 2025]]
+
* '''NeuCo-Bench''' – [[https://arxiv.org/html/2510.17914v1 LINK]] RVinge, arXiv 2025
*: Benchmarking framework for neural embeddings in Earth observation.
+
Benchmarking framework for neural embeddings in Earth observation.
* '''GeoINRID''' – [[https://github.com/arjunarao619/GeoINRID|GitHub: arjunarao619/GeoINRID]]
+
* '''GeoINRID''' – [[https://github.com/arjunarao619/GeoINRID LINK]] GitHub: arjunarao619/GeoINRID
*: Geospatial inference and representation learning toolkit.
+
Geospatial inference and representation learning toolkit.
  
 
=== Challenges ===
 
=== Challenges ===
* '''Embed2Scale Challenge''' – [[https://research.ibm.com/publications/the-2025-cvpr-earthvision-data-challenge-by-embed2scale|CVPR CAlbrecht 2025]]
+
* '''Embed2Scale Challenge''' – [[https://research.ibm.com/publications/the-2025-cvpr-earthvision-data-challenge-by-embed2scale LINK]] CVPR CAlbrecht 2025
*: Large-scale Earth vision challenge focused on scale-aware embeddings.
+
Large-scale Earth vision challenge focused on scale-aware embeddings.
* '''TerraMind Blue-Sky Challenge''' – [[https://arxiv.org/|JJakubik, arXiv 2025]]
+
* '''TerraMind Blue-Sky Challenge''' –  
*: Generative modeling for Earth observation.
+
Generative modeling for Earth observation.
  
 
== 5. Key Themes & Trends ==
 
== 5. Key Themes & Trends ==
Line 121: Line 119:
  
 
# '''Benchmarking''': Projects are encouraged to standardize in benchmarking. Benchmarks including NeuCo-Bench and Embed2Scale.
 
# '''Benchmarking''': Projects are encouraged to standardize in benchmarking. Benchmarks including NeuCo-Bench and Embed2Scale.
 +
 +
 +
== Deatasets Auto-Edit ==
 +
 +
* '''EuroSAT''' – [[https://zenodo.org/records/7711810 Zenodo]]
 +
Land use classification dataset using Sentinel-2 satellite data.
 +
* '''EuroCrops''' – [[https://pmc.ncbi.nlm.nih.gov/articles/PMC10495462/ PMC_10495462]]
 +
Crop type mapping dataset for Europe.
 +
* '''National Land Cover Database (NLCD)''' – [[https://www.mrlc.gov/data/legends/national-land-cover-database-class-legend-and-description MRLC]]
 +
USA land cover classes.
 +
* '''SSL4EO-S12''' – [[https://github.com/zhu-xlab/SSL4EO-S12 GitHub]]
 +
Multimodal, multitemporal dataset for self-supervised learning.
 +
* '''Copernicus-Pretrain''' – [[https://github.com/zhu-xlab/Copernicus-FM GitHub]]
 +
An extension of the SSL4EO-S12 dataset to all major Sentinel missions (S1-S5P).
 +
* '''BigEarthNet''' – [[https://bigearth.net/ BigEarthNet]]
 +
Large-scale multi-label satellite image classification dataset.
 +
* '''Resisc45''' – [[https://doi.org/10.1109/jproc.2017.2675998 IEEE DOI]]
 +
Remote sensing image classification dataset with 45 categories.
 +
* '''UC Merced''' – [[https://vision.ucmerced.edu/datasets/uc-merced.html UC Merced]]
 +
Aerial image dataset for land use classification.
 +
* '''Potsdam''' – [[https://www.isprs.org/resources/datasets/benchmarks/ ISPRS]]
 +
Semantic segmentation dataset for urban areas from aerial imagery.
 +
* '''Vaihingen''' – [[https://www.isprs.org/resources/datasets/benchmarks/ ISPRS]]
 +
Semantic segmentation dataset for urban areas from aerial imagery.
 +
* '''Inria Aerial Image Labeling''' – [[https://project.inria.fr/aerialimagelabeling/ Inria]]
 +
Aerial image segmentation dataset for building footprint extraction.
 +
* '''NAIP''' – [[https://www.usgs.gov/centers/eros/science/usgs-eros-archive-aerial-photography-national-agriculture-imagery-program-naip USGS EROS]]
 +
National Agriculture Imagery Program data for the USA.
 +
* '''Sentinel-2''' – [[https://sentinels.copernicus.eu/web/sentinel/home Copernicus]]
 +
Multispectral imagery from the Sentinel-2 mission.
 +
* '''Landsat''' – [[https://www.usgs.gov/landsat-missions USGS Landsat]]
 +
Long-term archive of medium-resolution satellite imagery.
 +
* '''OpenStreetMap''' – [[https://www.openstreetmap.org/ OpenStreetMap]]
 +
Collaborative project to create a free editable map of the world.
 +
* '''GFED''' (Global Fire Emissions Database) – [[https://www.globalfiredata.org/ Global Fire Data]]
 +
Global dataset of biomass burning emissions.
 +
* '''GBIF''' – [[https://www.gbif.org/ GBIF]]
 +
Global biodiversity information facility dataset.
 +
* '''Open Buildings''' – [[https://github.com/microsoft/globalmlbuildingfootprints Microsoft Research]]
 +
Global building footprint detection dataset.
 +
* '''OpenAerialMap''' – [[https://www.openaerialmap.org/ OpenAerialMap]]
 +
Open-source aerial imagery dataset.
 +
* '''Major-Tom''' – [[https://philab.esa.int/hello-major-tom-esa-%CF%86-lab-releases-largest-ml-ready-sentinel-2-dataset-ever-published/ GitHub]]
 +
Large-scale remote sensing image classification dataset.
 +
* '''Google Satellite Embedding''' – [[https://developers.google.com/earth-engine/datasets/catalog/GOOGLE_SATELLITE_EMBEDDING_V1_ANNUAL Google Earth Engine]]
 +
Pre-trained embeddings for Google satellite imagery.
 +
* '''Dota''' – [[https://captain-whu.github.io/DOTA/ DOTA Website]]
 +
Large-scale dataset for object detection in aerial images.
 +
  
 
== Research Directions ==
 
== Research Directions ==
Line 128: Line 175:
 
* '''Ethics and Bias''': Investigating fairness and bias in global EO models trained on unevenly distributed data.
 
* '''Ethics and Bias''': Investigating fairness and bias in global EO models trained on unevenly distributed data.
 
* '''Edge Deployment''': Making these large foundation models deployable on resource-constrained platforms (e.g., for field use).
 
* '''Edge Deployment''': Making these large foundation models deployable on resource-constrained platforms (e.g., for field use).
 
##-----------------------------------------------------
 
 
Notes on
 
  '''EARTH EMBEDDINGS AS PRODUCTS: TAXONOMY, ECOSYSTEM, AND STANDARDIZED ACCESS'''
 
<pre>
 
Heng Fang† ∗  Adam J Stewart‡ ∗  Isaac Corley§ *  Xiao Xiang Zhu *  Hossein Azizpour†
 
† KTH Royal Institute of Technology, Stockholm, Sweden
 
‡ Chair of Data Science in Earth Observation, Technical University of Munich, Munich, Germany
 
§ Wherobots, San Antonio, USA
 
</pre>
 
arXiv:2601.13134v1 19 Jan 2026  [LINK](https://arxiv.org/abs/2601.13134)
 
 
Introduction
 
 
• A comprehensive survey that organizes existing geospatial embedding products into a structured taxonomy and provides a detailed metadata atlas (resolution, license, etc.).
 
• Unified Integration: implements standardized data loaders for these embeddings in [TorchGeo](https://www.osgeo.org/projects/torchgeo/) 
 
 
An overview landscape is proposed : a) Analysis Frameworks & Tools; b) Embeddings data artifacts; c) charting downstream application value, specifically mapping tasks and retrieval tasks. Embeddings are differentiated as either location-typed, patch-typed or pixel-typed. Details of existing products are shown.
 
 
🌍 1. Foundation Models for Earth Observation (EO)
 
 
These are the leading projects that aim to build general-purpose models capable of representing Earth from
 
satellite imagery and other geospatial modalities.
 
 
🔧 Projects:
 
- Clay Foundation Model – [Hugging Face](https://huggingface.co/made-with-clay/Clay) 2024 
 
  *A multimodal foundation model for Earth using diverse data sources.*
 
 
- Major TOM – [AFrancis IGARSS 2024](https://huggingface.co/Major-TOM) 
 
  *Expandable datasets and models for global EO coverage.*
 
 
- Earth Index Embeddings – [Earth Genome](https://www.earthgenome.org/earth-index), 2025 
 
  *A large-scale embedding system built from Earth observation data.*
 
 
- Copernicus-Embed – [Zhu et al., AI4Copernicus Project](https://github.com/zhu-xlab/Copernicus-FM) 
 
  *Foundation model leveraging Copernicus Sentinel data.*
 
 
- Presto Embeddings – [NASA Harvest](https://nasaharvest.github.io/presto-embeddings/) 
 
  *Embedding framework for satellite time series and land use analysis.*
 
 
- Tessera Embeddings – [GeoTessera Docs](https://geotessera.readthedocs.io/en/latest/)  [REPO](https://github.com/ucam-eo/tessera)
 
  *pixel-based Temporal spectral embeddings for Earth representation.*
 
 
- Google Satellite Embedding (AlphaEarth) – [Google Earth
 
Engine](https://developers.google.com/earth-engine/datasets/catalog/GOOGLE_SATELLITE_EMBEDDING_V1_ANNUAL) 
 
  *An early-stage embedding model using Google's global satellite data.*
 
 
- OlmoEarth – [AllenAI, 2025](https://allenai.org/olmoearth) 
 
  *Latent image modeling approach for multimodal Earth observation.*
 
 
📚 Key Papers:
 
- XXZhu 2025 [LINK](https://www.nature.com/articles/s43247-025-03127-x) – “On the Foundations of Earth Foundation Models” 
 
- CFBrown 2025 [LINK](https://arxiv.org/abs/2507.22291) – “AlphaEarth Foundations” 
 
- KKlemmer 2023 [LINK](https://arxiv.org/abs/2311.17179) – “SatCLIP: Global Location Embeddings with Satellite Imagery”
 
 
---
 
 
📦 2. **Datasets**
 
 
Large-scale, open-access datasets play a central role in training and evaluating Earth foundation models.
 
 
🗃 Datasets:
 
- EuroSAT – [Zenodo](https://zenodo.org/records/7711810) 
 
  *Land use classification dataset using Sentinel-2 satellite data.*
 
 
- EuroCrops – [PMC 10495462](https://pmc.ncbi.nlm.nih.gov/articles/PMC10495462/) 
 
  *Crop type mapping dataset for Europe.*
 
 
- National Land Cover Database (NLCD) – Photogrammetric Engineering & Remote Sensing 2001 [LINK](https://www.ingentaconnect.com/content/asprs/pers/2004/00000070/00000007/art00005;jsessionid=2awgc0agiboi4.x-ic-live-02) 
 
  *USA land cover classes*
 
 
- SSL4EO-S12 – IEEE Geoscience and Remote Sensing  2023 [LINK](https://github.com/zhu-xlab/SSL4EO-S12) 
 
  *Multimodal, multitemporal dataset for self-supervised learning.*
 
 
- Copernicus-Pretrain – [IEEE Geoscience and Remote Sensing  2023](https://github.com/zhu-xlab/Copernicus-FM)
 
  *an extension of the SSL4EO-S12 dataset to all major Sentinel missions (S1-S5P)*
 
 
---
 
 
🧠 3. **Models & Methods**
 
 
These include both classical and cutting-edge machine learning approaches used in building Earth foundation
 
models.
 
 
⚙ Core Methods:
 
- SatCLIP** – AAAI 2025 etc [LINK](https://arxiv.org/abs/2311.17179)
 
  *Vision-language model for global location representations.*
 
 
- MMEarth** – [EU/CV 2024](https://arxiv.org/abs/2405.02771)
 
  *Multimodal pretext tasks for geospatial representation learning.*
 
 
- ResNet** – [KHe IEEE/CV 2016](https://viso.ai/deep-learning/resnet-residual-neural-network/)
 
  *Baseline CNN architecture widely used in EO.*
 
 
- ConvNeXt V2** – [Woo et al., IEEE/CVF 2023](https://github.com/facebookresearch/ConvNeXt-V2) 
 
  *Efficient ConvNet architecture using masked autoencoders (MAE).*
 
 
- DINO, DINOv2, DINOv3** – INRIA 2021–2023, META [LINK](https://dinov3.org/)
 
  *Vision transformers with self-supervised learning capabilities.*
 
 
- MAE (Masked Autoencoders)** – IEEE/CVF 2021 [LINK](https://arxiv.org/abs/2111.06377) 
 
  *Self-supervised learning for vision transformers.*
 
 
 
🧬 Distillation & Advanced Approaches:
 
- **Distillation methods** – Transfer knowledge from large models.
 
- **Neural plasticity-inspired models** – [ZXiong, arXiv 2024] 
 
  *Inspired by biological learning mechanisms.*
 
- **Multi-label guided soft contrastive learning** – [YWang, IEEE TGRS, 2024]
 
- **Barlow Twins** – [Zbontar et al., arXiv 2021] 
 
  *Method for learning representations without contrastive loss.*
 
- **Continual Barlow Twins** – [IEEE JSTARS, 2023] 
 
  *Extends Barlow Twins to continual learning in EO segmentation.*
 
 
---
 
 
🛠 4. **Tools & Benchmarks**
 
 
These are software systems and frameworks that support development, evaluation, or deployment of EO AI
 
models.
 
 
🧰 Tools:
 
- TorchGeo** – [AJStewart ACM 2025](https://www.osgeo.org/projects/torchgeo/)
 
  *PyTorch library for geospatial deep learning.*
 
 
- NeuCo-Bench** – [RVinge, arXiv 2025](https://arxiv.org/html/2510.17914v1)
 
  *Benchmarking framework for neural embeddings in Earth observation.*
 
 
- GeoINRID** – [GitHub: arjunarao619/GeoINRID](https://github.com/arjunarao619/GeoINRID) 
 
  *Geospatial inference and representation learning toolkit.*
 
 
🏆 Challenges:
 
- **Embed2Scale Challenge** – [CVPR CAlbrecht 2025](https://research.ibm.com/publications/the-2025-cvpr-earthvision-data-challenge-by-embed2scale) 
 
  *Large-scale Earth vision challenge focused on scale-aware embeddings.*
 
 
- TerraMind Blue-Sky Challenge** – [JJakubik, arXiv 2025] 
 
  *Generative modeling for Earth observation.*
 
 
---
 
 
🧭 5. **Key Themes & Trends**
 
 
1. Foundation Models**: TorchGeo now includes data loaders designed for search/retrieval (Clay, Major TOM, Earth
 
Index),  and for dense prediction tasks like land cover mapping (Copernicus, Presto, Tessera, Google). TorchGeo allows us to enable fair, side-by-side benchmarking of different embedding models on the same downstream tasks, forming the basis for future experiments. Projects are encouraged to strengthen and improve explainability.
 
 
1.1 Major TOM Notes**  Major TOM embeddings are not (yet) really product-oriented and are aimed with a similar purpose to the MT Core datasets - to make it easier to experiment and benchmark model outputs (hence, unlike TESSERA and AEF which came a few months after, MT embeddings do not have consistent or aggregated temporal scope). We haven't had enough time to finish off the preprint, but my current plan is to provide a simple MT Embedding benchmark at this year's EGU and integrate that into the arxiv pre-print. --Miko
 
 
1.2 Earth Index / Earth Genome** Use the Earth Index application (earthindex.ai)  for non-technical users to use the embeddings we published on source.coop. Users of the web app (non-technical journalists, indigenous communities/allies, NGOs) have been our main focus. Users of the source.coop embeddings have generally been more technical folks interested in exploring/innovating in what's possible --BenStrong
 
 
1.3 Clay** Clay and Presto offer documented tutorials on generating new embeddings with their models. In CLAY, the encoder receives unmasked patches, latitude-longitude data, and timestep information. Notably, the last 2 embeddings from the encoder specifically represent the latitude-longitude and timestep embeddings.
 
 
 
2. Self-Supervised Learning (SSL)**:
 
 
3. Multimodal Integration**:
 
 
4. Open Data & Tools**: Open-source projects (e.g., TorchGeo, Copernicus-Embed) and public datasets
 
(EuroSAT, EuroCrops) are crucial for reproducibility and democratization of EO AI. Projects are encouraged to increase Input Data Diversity, and to adopt cloud-native data formats for geospatial data.
 
 
5. Benchmarking**: Projects are encouraged to standardize in benchmarking. Benchmarks including NeuCo-Bench and Embed2Scale.
 
 
---
 
 
📌 Research Directions
 
 
- Unified Earth Foundation Models**:
 
- Interpretability in EO AI**: Exploring how these embeddings can be interpreted by domain experts.
 
- Ethics and Bias**: Investigating fairness and bias in global EO models trained on unevenly distributed
 
data.
 
- Edge Deployment**: Making these large foundation models deployable on resource-constrained platforms
 
    (e.g., for field use).
 
 
  
  
 
[[Category:TorchGeo]]
 
[[Category:TorchGeo]]

Latest revision as of 14:13, 15 June 2026

Template:Infobox Paper

[arXiv:2601.13134v1] [cs.SE] 19 Jan 2026

Earth Embeddings as Products: Taxonomy, Ecosystem, and Standardized Access is a comprehensive survey that organizes existing geospatial embedding products into a structured taxonomy through a three-layer taxonomy: Data, Tools, and Value. This research paper provides a detailed metadata atlas (resolution, license, etc.). It also proposes a unified integration by implementing standardized data loaders for these embeddings in [TorchGeo] .

An overview landscape is proposed comprising: a) Analysis Frameworks & Tools b) Embeddings data artifacts c) Charting downstream application value, specifically mapping tasks and retrieval tasks.

Embeddings are differentiated as either location-typed, patch-typed, or pixel-typed. Details of existing products are shown. "We extend TorchGeo with a unified API that standardizes the loading and querying of diverse embedding products."

1. Foundation Models for Earth Observation (EO)

These are the leading projects that aim to build general-purpose models capable of representing Earth from satellite imagery and other geospatial modalities.

Projects

  • OlmoEarth – [AllenAI] (2025)
    Latent image modeling approach for multimodal Earth observation.
  • Major TOM – [MajorTOM] AFrancis IGARSS 2024
    (Terrestrial Observation Metaset) is an open-source framework and massive satellite dataset created by the European Space Agency's Φ-lab.
  • Clay Foundation Model – [HuggingFace] (2024)
    A multimodal foundation model for Earth using diverse data sources.
  • Earth Index Embeddings – [EarthGenome] (2025)
    A large-scale embedding system built from Earth observation data.
  • Copernicus-Embed – [LINK] Zhu et al., AI4Copernicus Project
    Foundation model leveraging Copernicus Sentinel data.
  • Presto Embeddings – [NASAHarvest]
    Embedding framework for satellite time series and land use analysis.
  • Tessera Embeddings – [GeoTessera] Docs / [REPO]
    Pixel-based Temporal spectral embeddings for Earth representation.
  • Google Satellite Embedding (AlphaEarth) – [LINK] Google Earth Engine
    An early-stage embedding model using Google's global satellite data.

Key Papers

  • XXZhu 2025 [LINK] "On the Foundations of Earth Foundation Models" – Nature Computational Science
  • CFBrown 2025 [LINK] "AlphaEarth Foundations"
  • KKlemmer 2023 [LINK] "SatCLIP: Global Location Embeddings with Satellite Imagery"

2. Datasets

Large-scale, open-access datasets play a central role in training and evaluating Earth foundation models. See the list below for a representative sample.

3. Model Architectures for Embeddings

The generation of robust Earth observation embeddings relies on specific model families pre-trained on large-scale geospatial data. The following architectures are central to the TorchGeo ecosystem and recent foundation model research, categorized by their primary learning paradigm.

Vision-Language Models

These models align visual satellite imagery with textual or geographic metadata, creating unified embedding spaces for cross-modal retrieval.

  • SatCLIP – [[LINK]] (AAAI 2025)

A vision-language model designed to generate global location representations by aligning satellite imagery with geographic text embeddings, originally published in 2023 and widely studied.

A global dataset for multi-modal and geospatial representation learning. It consists of 12 modalities including pixel-level and image-level modalities from 1.2 million locations.

Self-Supervised Vision Transformers

Transformers pre-trained via self-supervised objectives (e.g., DINO, MAE) are currently the state-of-the-art for extracting rich, transferable embeddings from remote sensing imagery.

  • DINO / DINOv2 / DINOv3 – [[LINK]] (Meta/INRIA, 2021–2023)

Vision transformers trained with self-distillation without labels. DINOv2, in particular, produces highly discriminative embeddings that capture fine-grained geospatial features without downstream fine-tuning.

  • MAE (Masked Autoencoders) – [[LINK]] (CVPR 2021)

A self-supervised encoder that learns robust image representations by reconstructing masked patches. When applied to satellite data, MAE embeddings capture global context and local texture effectively.

Efficient Convolutional Backbones

While Transformers dominate embedding research, optimized CNNs remain critical for efficient embedding extraction in resource-constrained environments or as feature extractors for hybrid models.

  • ConvNeXt V2 – [[LINK]] (Woo et al., CVPR 2023)

A modernized ConvNet that incorporates Transformer-like design principles (e.g., large kernel sizes, layer normalization). It serves as a high-performance, efficient backbone for generating spatial embeddings in TorchGeo.

  • ResNet – [[LINK]] (He et al., CVPR 2016)

The foundational residual network. While older, ResNet variants remain the standard baseline for comparing embedding quality in early-stage Earth Foundation Model development.

Distillation & Advanced Approaches

  • Distillation methods – Transfer knowledge from large models.
  • Neural plasticity-inspired modelsTorchGeo_DOFA: Inspired by biological learning mechanisms.
  • Multi-label guided soft contrastive learning – YWang, IEEE TGRS, 2024.
  • Barlow Twins – Method for learning representations without contrastive loss.
  • Continual Barlow Twins – Extends Barlow Twins to continual learning in EO segmentation.

4. Tools & Benchmarks

These are software systems and frameworks that support development, evaluation, or deployment of EO AI models.

Tools

PyTorch library for geospatial deep learning.

  • NeuCo-Bench – [LINK] RVinge, arXiv 2025

Benchmarking framework for neural embeddings in Earth observation.

  • GeoINRID – [LINK] GitHub: arjunarao619/GeoINRID

Geospatial inference and representation learning toolkit.

Challenges

  • Embed2Scale Challenge – [LINK] CVPR CAlbrecht 2025

Large-scale Earth vision challenge focused on scale-aware embeddings.

  • TerraMind Blue-Sky Challenge

Generative modeling for Earth observation.

5. Key Themes & Trends

  1. Foundation Models: TorchGeo now includes data loaders designed for search/retrieval (Clay, Major TOM, Earth Index), and for dense prediction tasks like land cover mapping (Copernicus, Presto, Tessera, Google). TorchGeo allows us to enable fair, side-by-side benchmarking of different embedding models on the same downstream tasks, forming the basis for future experiments. Projects are encouraged to strengthen and improve explainability.
    • Major TOM Notes: Major TOM embeddings are not (yet) really product-oriented and are aimed with a similar purpose to the MT Core datasets - to make it easier to experiment and benchmark model outputs (hence, unlike TESSERA and AEF which came a few months after, MT embeddings do not have consistent or aggregated temporal scope). We haven't had enough time to finish off the preprint, but my current plan is to provide a simple MT Embedding benchmark at this year's EGU and integrate that into the arxiv pre-print. --Miko
    • Earth Index / Earth Genome: Use the Earth Index application (earthindex.ai) for non-technical users to use the embeddings we published on source.coop. Users of the web app (non-technical journalists, indigenous communities/allies, NGOs) have been our main focus. Users of the source.coop embeddings have generally been more technical folks interested in exploring/innovating in what's possible --BenStrong
    • Clay: Clay and Presto offer documented tutorials on generating new embeddings with their models. In CLAY, the encoder receives unmasked patches, latitude-longitude data, and timestep information. Notably, the last 2 embeddings from the encoder specifically represent the latitude-longitude and timestep embeddings.
  1. Self-Supervised Learning (SSL):
  1. Multimodal Integration:
  1. Open Data & Tools: Open-source projects (e.g., TorchGeo, Copernicus-Embed) and public datasets (EuroSAT, EuroCrops) are crucial for reproducibility and democratization of EO AI. Projects are encouraged to increase Input Data Diversity, and to adopt cloud-native data formats for geospatial data.
  1. Benchmarking: Projects are encouraged to standardize in benchmarking. Benchmarks including NeuCo-Bench and Embed2Scale.


Deatasets Auto-Edit

Land use classification dataset using Sentinel-2 satellite data.

Crop type mapping dataset for Europe.

  • National Land Cover Database (NLCD) – [MRLC]

USA land cover classes.

Multimodal, multitemporal dataset for self-supervised learning.

  • Copernicus-Pretrain – [GitHub]

An extension of the SSL4EO-S12 dataset to all major Sentinel missions (S1-S5P).

Large-scale multi-label satellite image classification dataset.

Remote sensing image classification dataset with 45 categories.

Aerial image dataset for land use classification.

Semantic segmentation dataset for urban areas from aerial imagery.

Semantic segmentation dataset for urban areas from aerial imagery.

  • Inria Aerial Image Labeling – [Inria]

Aerial image segmentation dataset for building footprint extraction.

National Agriculture Imagery Program data for the USA.

Multispectral imagery from the Sentinel-2 mission.

Long-term archive of medium-resolution satellite imagery.

Collaborative project to create a free editable map of the world.

Global dataset of biomass burning emissions.

Global biodiversity information facility dataset.

Global building footprint detection dataset.

Open-source aerial imagery dataset.

Large-scale remote sensing image classification dataset.

Pre-trained embeddings for Google satellite imagery.

Large-scale dataset for object detection in aerial images.


Research Directions

  • Unified Earth Foundation Models:
  • Interpretability in EO AI: Exploring how these embeddings can be interpreted by domain experts.
  • Ethics and Bias: Investigating fairness and bias in global EO models trained on unevenly distributed data.
  • Edge Deployment: Making these large foundation models deployable on resource-constrained platforms (e.g., for field use).