Difference between revisions of "TorchGeo embeddings"

Revision as of 06:57, 13 June 2026

Earth Embeddings as Products: Taxonomy, Ecosystem, and Standardized Access is a comprehensive survey that organizes existing geospatial embedding products into a structured taxonomy and provides a detailed metadata atlas (resolution, license, etc.). It also proposes a unified integration by implementing standardized data loaders for these embeddings in TorchGeo.

An overview landscape is proposed comprising:

a) Analysis Frameworks & Tools
b) Embeddings data artifacts
c) Charting downstream application value, specifically mapping tasks and retrieval tasks.

Embeddings are differentiated as either location-typed, patch-typed, or pixel-typed. Details of existing products are shown below.

1. Foundation Models for Earth Observation (EO)

These are the leading projects that aim to build general-purpose models capable of representing Earth from satellite imagery and other geospatial modalities.

Projects

Clay Foundation Model – [Face] (2024)
A multimodal foundation model for Earth using diverse data sources.
Major TOM – [IGARSS 2024]
Expandable datasets and models for global EO coverage.
Earth Index Embeddings – [Genome] (2025)
A large-scale embedding system built from Earth observation data.
Copernicus-Embed – [et al., AI4Copernicus Project]
Foundation model leveraging Copernicus Sentinel data.
Presto Embeddings – [Harvest]
Embedding framework for satellite time series and land use analysis.
Tessera Embeddings – [Docs] / [[1]]
Pixel-based Temporal spectral embeddings for Earth representation.
Google Satellite Embedding (AlphaEarth) – [Earth Engine]
An early-stage embedding model using Google's global satellite data.
OlmoEarth – [[2]] (2025)
Latent image modeling approach for multimodal Earth observation.

Key Papers

XXZhu 2025 ["On the Foundations of Earth Foundation Models"] – Nature Computational Science
CFBrown 2025 ["AlphaEarth Foundations"]
KKlemmer 2023 ["SatCLIP: Global Location Embeddings with Satellite Imagery"]

2. Datasets

Large-scale, open-access datasets play a central role in training and evaluating Earth foundation models.

Datasets

EuroSAT – [[3]]
Land use classification dataset using Sentinel-2 satellite data.
EuroCrops – [10495462]
Crop type mapping dataset for Europe.
National Land Cover Database (NLCD) – [Engineering & Remote Sensing] (2001)
USA land cover classes.
SSL4EO-S12 – [Geoscience and Remote Sensing] (2023)
Multimodal, multitemporal dataset for self-supervised learning.
Copernicus-Pretrain – [Geoscience and Remote Sensing] (2023)
An extension of the SSL4EO-S12 dataset to all major Sentinel missions (S1-S5P).

3. Models & Methods

These include both classical and cutting-edge machine learning approaches used in building Earth foundation models.

Core Methods

SatCLIP – [2025] etc.
Vision-language model for global location representations.
MMEarth – [2024]
Multimodal pretext tasks for geospatial representation learning.
ResNet – [IEEE/CV 2016]
Baseline CNN architecture widely used in EO.
ConvNeXt V2 – [et al., IEEE/CVF 2023]
Efficient ConvNet architecture using masked autoencoders (MAE).
DINO, DINOv2, DINOv3 – [2021–2023, META]
Vision transformers with self-supervised learning capabilities.
MAE (Masked Autoencoders) – [2021]
Self-supervised learning for vision transformers.

Distillation & Advanced Approaches

Distillation methods – Transfer knowledge from large models.
Neural plasticity-inspired models – [arXiv 2024]: Inspired by biological learning mechanisms.
Multi-label guided soft contrastive learning – [IEEE TGRS, 2024].
Barlow Twins – [et al., arXiv 2021]: Method for learning representations without contrastive loss.
Continual Barlow Twins – [JSTARS, 2023]: Extends Barlow Twins to continual learning in EO segmentation.

4. Tools & Benchmarks

These are software systems and frameworks that support development, evaluation, or deployment of EO AI models.

Tools

TorchGeo – [ACM 2025]
PyTorch library for geospatial deep learning.
NeuCo-Bench – [arXiv 2025]
Benchmarking framework for neural embeddings in Earth observation.
GeoINRID – [arjunarao619/GeoINRID]
Geospatial inference and representation learning toolkit.

Challenges

Embed2Scale Challenge – [CAlbrecht 2025]
Large-scale Earth vision challenge focused on scale-aware embeddings.
TerraMind Blue-Sky Challenge – [arXiv 2025]
Generative modeling for Earth observation.

5. Key Themes & Trends

Foundation Models: TorchGeo now includes data loaders designed for search/retrieval (Clay, Major TOM, Earth Index), and for dense prediction tasks like land cover mapping (Copernicus, Presto, Tessera, Google). TorchGeo allows us to enable fair, side-by-side benchmarking of different embedding models on the same downstream tasks, forming the basis for future experiments. Projects are encouraged to strengthen and improve explainability.
- Major TOM Notes: Major TOM embeddings are not (yet) really product-oriented and are aimed with a similar purpose to the MT Core datasets - to make it easier to experiment and benchmark model outputs (hence, unlike TESSERA and AEF which came a few months after, MT embeddings do not have consistent or aggregated temporal scope). We haven't had enough time to finish off the preprint, but my current plan is to provide a simple MT Embedding benchmark at this year's EGU and integrate that into the arxiv pre-print. --Miko
- Earth Index / Earth Genome: Use the Earth Index application (earthindex.ai) for non-technical users to use the embeddings we published on source.coop. Users of the web app (non-technical journalists, indigenous communities/allies, NGOs) have been our main focus. Users of the source.coop embeddings have generally been more technical folks interested in exploring/innovating in what's possible --BenStrong
- Clay: Clay and Presto offer documented tutorials on generating new embeddings with their models. In CLAY, the encoder receives unmasked patches, latitude-longitude data, and timestep information. Notably, the last 2 embeddings from the encoder specifically represent the latitude-longitude and timestep embeddings.

Self-Supervised Learning (SSL):

Multimodal Integration:

Open Data & Tools: Open-source projects (e.g., TorchGeo, Copernicus-Embed) and public datasets (EuroSAT, EuroCrops) are crucial for reproducibility and democratization of EO AI. Projects are encouraged to increase Input Data Diversity, and to adopt cloud-native data formats for geospatial data.

Benchmarking: Projects are encouraged to standardize in benchmarking. Benchmarks including NeuCo-Bench and Embed2Scale.

Research Directions

Unified Earth Foundation Models:
Interpretability in EO AI: Exploring how these embeddings can be interpreted by domain experts.
Ethics and Bias: Investigating fairness and bias in global EO models trained on unevenly distributed data.
Edge Deployment: Making these large foundation models deployable on resource-constrained platforms (e.g., for field use).

1. -----------------------------------------------------

Notes on

  EARTH EMBEDDINGS AS PRODUCTS: TAXONOMY, ECOSYSTEM, AND STANDARDIZED ACCESS

Heng Fang† ∗  Adam J Stewart‡ ∗  Isaac Corley§ *  Xiao Xiang Zhu *  Hossein Azizpour†
† KTH Royal Institute of Technology, Stockholm, Sweden
‡ Chair of Data Science in Earth Observation, Technical University of Munich, Munich, Germany
§ Wherobots, San Antonio, USA

arXiv:2601.13134v1 19 Jan 2026 [LINK](https://arxiv.org/abs/2601.13134)

Introduction

• A comprehensive survey that organizes existing geospatial embedding products into a structured taxonomy and provides a detailed metadata atlas (resolution, license, etc.). • Unified Integration: implements standardized data loaders for these embeddings in [TorchGeo](https://www.osgeo.org/projects/torchgeo/)

An overview landscape is proposed : a) Analysis Frameworks & Tools; b) Embeddings data artifacts; c) charting downstream application value, specifically mapping tasks and retrieval tasks. Embeddings are differentiated as either location-typed, patch-typed or pixel-typed. Details of existing products are shown.

🌍 1. Foundation Models for Earth Observation (EO)

These are the leading projects that aim to build general-purpose models capable of representing Earth from satellite imagery and other geospatial modalities.

🔧 Projects:

- Clay Foundation Model – [Hugging Face](https://huggingface.co/made-with-clay/Clay) 2024

 *A multimodal foundation model for Earth using diverse data sources.*

- Major TOM – [AFrancis IGARSS 2024](https://huggingface.co/Major-TOM)

 *Expandable datasets and models for global EO coverage.*

- Earth Index Embeddings – [Earth Genome](https://www.earthgenome.org/earth-index), 2025

 *A large-scale embedding system built from Earth observation data.*

- Copernicus-Embed – [Zhu et al., AI4Copernicus Project](https://github.com/zhu-xlab/Copernicus-FM)

 *Foundation model leveraging Copernicus Sentinel data.*

- Presto Embeddings – [NASA Harvest](https://nasaharvest.github.io/presto-embeddings/)

 *Embedding framework for satellite time series and land use analysis.*

- Tessera Embeddings – [GeoTessera Docs](https://geotessera.readthedocs.io/en/latest/) [REPO](https://github.com/ucam-eo/tessera)

 *pixel-based Temporal spectral embeddings for Earth representation.*

- Google Satellite Embedding (AlphaEarth) – [Google Earth Engine](https://developers.google.com/earth-engine/datasets/catalog/GOOGLE_SATELLITE_EMBEDDING_V1_ANNUAL)

 *An early-stage embedding model using Google's global satellite data.*

- OlmoEarth – [AllenAI, 2025](https://allenai.org/olmoearth)

 *Latent image modeling approach for multimodal Earth observation.*

📚 Key Papers:

- XXZhu 2025 [LINK](https://www.nature.com/articles/s43247-025-03127-x) – “On the Foundations of Earth Foundation Models” - CFBrown 2025 [LINK](https://arxiv.org/abs/2507.22291) – “AlphaEarth Foundations” - KKlemmer 2023 [LINK](https://arxiv.org/abs/2311.17179) – “SatCLIP: Global Location Embeddings with Satellite Imagery”

---

📦 2. **Datasets**

Large-scale, open-access datasets play a central role in training and evaluating Earth foundation models.

🗃 Datasets:

- EuroSAT – [Zenodo](https://zenodo.org/records/7711810)

 *Land use classification dataset using Sentinel-2 satellite data.*

- EuroCrops – [PMC 10495462](https://pmc.ncbi.nlm.nih.gov/articles/PMC10495462/)

 *Crop type mapping dataset for Europe.*

- National Land Cover Database (NLCD) – Photogrammetric Engineering & Remote Sensing 2001 [LINK](https://www.ingentaconnect.com/content/asprs/pers/2004/00000070/00000007/art00005;jsessionid=2awgc0agiboi4.x-ic-live-02)

 *USA land cover classes*

- SSL4EO-S12 – IEEE Geoscience and Remote Sensing 2023 [LINK](https://github.com/zhu-xlab/SSL4EO-S12)

 *Multimodal, multitemporal dataset for self-supervised learning.*

- Copernicus-Pretrain – [IEEE Geoscience and Remote Sensing 2023](https://github.com/zhu-xlab/Copernicus-FM)

 *an extension of the SSL4EO-S12 dataset to all major Sentinel missions (S1-S5P)*

---

🧠 3. **Models & Methods**

These include both classical and cutting-edge machine learning approaches used in building Earth foundation models.

⚙ Core Methods:

- SatCLIP** – AAAI 2025 etc [LINK](https://arxiv.org/abs/2311.17179)

 *Vision-language model for global location representations.*

- MMEarth** – [EU/CV 2024](https://arxiv.org/abs/2405.02771)

 *Multimodal pretext tasks for geospatial representation learning.*

- ResNet** – [KHe IEEE/CV 2016](https://viso.ai/deep-learning/resnet-residual-neural-network/)

 *Baseline CNN architecture widely used in EO.*

- ConvNeXt V2** – [Woo et al., IEEE/CVF 2023](https://github.com/facebookresearch/ConvNeXt-V2)

 *Efficient ConvNet architecture using masked autoencoders (MAE).*

- DINO, DINOv2, DINOv3** – INRIA 2021–2023, META [LINK](https://dinov3.org/)

 *Vision transformers with self-supervised learning capabilities.*

- MAE (Masked Autoencoders)** – IEEE/CVF 2021 [LINK](https://arxiv.org/abs/2111.06377)

 *Self-supervised learning for vision transformers.*

🧬 Distillation & Advanced Approaches:

- **Distillation methods** – Transfer knowledge from large models. - **Neural plasticity-inspired models** – [ZXiong, arXiv 2024]

 *Inspired by biological learning mechanisms.*

- **Multi-label guided soft contrastive learning** – [YWang, IEEE TGRS, 2024] - **Barlow Twins** – [Zbontar et al., arXiv 2021]

 *Method for learning representations without contrastive loss.*

- **Continual Barlow Twins** – [IEEE JSTARS, 2023]

 *Extends Barlow Twins to continual learning in EO segmentation.*

---

🛠 4. **Tools & Benchmarks**

These are software systems and frameworks that support development, evaluation, or deployment of EO AI models.

🧰 Tools:

- TorchGeo** – [AJStewart ACM 2025](https://www.osgeo.org/projects/torchgeo/)

 *PyTorch library for geospatial deep learning.*

- NeuCo-Bench** – [RVinge, arXiv 2025](https://arxiv.org/html/2510.17914v1)

 *Benchmarking framework for neural embeddings in Earth observation.*

- GeoINRID** – [GitHub: arjunarao619/GeoINRID](https://github.com/arjunarao619/GeoINRID)

 *Geospatial inference and representation learning toolkit.*

🏆 Challenges:

- **Embed2Scale Challenge** – [CVPR CAlbrecht 2025](https://research.ibm.com/publications/the-2025-cvpr-earthvision-data-challenge-by-embed2scale)

 *Large-scale Earth vision challenge focused on scale-aware embeddings.*

- TerraMind Blue-Sky Challenge** – [JJakubik, arXiv 2025]

 *Generative modeling for Earth observation.*

---

🧭 5. **Key Themes & Trends**

1. Foundation Models**: TorchGeo now includes data loaders designed for search/retrieval (Clay, Major TOM, Earth Index), and for dense prediction tasks like land cover mapping (Copernicus, Presto, Tessera, Google). TorchGeo allows us to enable fair, side-by-side benchmarking of different embedding models on the same downstream tasks, forming the basis for future experiments. Projects are encouraged to strengthen and improve explainability.

1.1 Major TOM Notes** Major TOM embeddings are not (yet) really product-oriented and are aimed with a similar purpose to the MT Core datasets - to make it easier to experiment and benchmark model outputs (hence, unlike TESSERA and AEF which came a few months after, MT embeddings do not have consistent or aggregated temporal scope). We haven't had enough time to finish off the preprint, but my current plan is to provide a simple MT Embedding benchmark at this year's EGU and integrate that into the arxiv pre-print. --Miko

1.2 Earth Index / Earth Genome** Use the Earth Index application (earthindex.ai) for non-technical users to use the embeddings we published on source.coop. Users of the web app (non-technical journalists, indigenous communities/allies, NGOs) have been our main focus. Users of the source.coop embeddings have generally been more technical folks interested in exploring/innovating in what's possible --BenStrong

1.3 Clay** Clay and Presto offer documented tutorials on generating new embeddings with their models. In CLAY, the encoder receives unmasked patches, latitude-longitude data, and timestep information. Notably, the last 2 embeddings from the encoder specifically represent the latitude-longitude and timestep embeddings.

2. Self-Supervised Learning (SSL)**:

3. Multimodal Integration**:

4. Open Data & Tools**: Open-source projects (e.g., TorchGeo, Copernicus-Embed) and public datasets (EuroSAT, EuroCrops) are crucial for reproducibility and democratization of EO AI. Projects are encouraged to increase Input Data Diversity, and to adopt cloud-native data formats for geospatial data.

5. Benchmarking**: Projects are encouraged to standardize in benchmarking. Benchmarks including NeuCo-Bench and Embed2Scale.

---

📌 Research Directions

- Unified Earth Foundation Models**: - Interpretability in EO AI**: Exploring how these embeddings can be interpreted by domain experts. - Ethics and Bias**: Investigating fairness and bias in global EO models trained on unevenly distributed data. - Edge Deployment**: Making these large foundation models deployable on resource-constrained platforms

    (e.g., for field use).

@@ Line 1: / Line 1: @@
+{{Infobox Paper
+| title       = Earth Embeddings as Products: Taxonomy, Ecosystem, and Standardized Access
+| authors     = Heng Fang<ref>†</ref>, Adam J Stewart<ref>‡</ref>, Isaac Corley<ref>§</ref>, Xiao Xiang Zhu<ref>*</ref>, Hossein Azizpour<ref>†</ref>
+| affiliations =
+ * † KTH Royal Institute of Technology, Stockholm, Sweden
+ * ‡ Chair of Data Science in Earth Observation, Technical University of Munich, Munich, Germany
+ * § Wherobots, San Antonio, USA
+| arxiv       = [https://arxiv.org/abs/2601.13134 2601.13134v1] (19 Jan 2026)
+}}
+'''Earth Embeddings as Products: Taxonomy, Ecosystem, and Standardized Access''' is a comprehensive survey that organizes existing geospatial embedding products into a structured taxonomy and provides a detailed metadata atlas (resolution, license, etc.). It also proposes a unified integration by implementing standardized data loaders for these embeddings in [[TorchGeo]].
+An overview landscape is proposed comprising:
+* a) Analysis Frameworks & Tools
+* b) Embeddings data artifacts
+* c) Charting downstream application value, specifically mapping tasks and retrieval tasks.
+Embeddings are differentiated as either location-typed, patch-typed, or pixel-typed. Details of existing products are shown below.
+== 1. Foundation Models for Earth Observation (EO) ==
+These are the leading projects that aim to build general-purpose models capable of representing Earth from satellite imagery and other geospatial modalities.
+=== Projects ===
+* '''Clay Foundation Model''' – [[https://huggingface.co/made-with-clay/Clay|Hugging Face]] (2024)
+*: A multimodal foundation model for Earth using diverse data sources.
+* '''Major TOM''' – [[https://huggingface.co/Major-TOM|AFrancis IGARSS 2024]]
+*: Expandable datasets and models for global EO coverage.
+* '''Earth Index Embeddings''' – [[https://www.earthgenome.org/earth-index|Earth Genome]] (2025)
+*: A large-scale embedding system built from Earth observation data.
+* '''Copernicus-Embed''' – [[https://github.com/zhu-xlab/Copernicus-FM|Zhu et al., AI4Copernicus Project]]
+*: Foundation model leveraging Copernicus Sentinel data.
+* '''Presto Embeddings''' – [[https://nasaharvest.github.io/presto-embeddings/|NASA Harvest]]
+*: Embedding framework for satellite time series and land use analysis.
+* '''Tessera Embeddings''' – [[https://geotessera.readthedocs.io/en/latest/|GeoTessera Docs]] / [[https://github.com/ucam-eo/tessera|REPO]]
+*: Pixel-based Temporal spectral embeddings for Earth representation.
+* '''Google Satellite Embedding (AlphaEarth)''' – [[https://developers.google.com/earth-engine/datasets/catalog/GOOGLE_SATELLITE_EMBEDDING_V1_ANNUAL|Google Earth Engine]]
+*: An early-stage embedding model using Google's global satellite data.
+* '''OlmoEarth''' – [[https://allenai.org/olmoearth|AllenAI]] (2025)
+*: Latent image modeling approach for multimodal Earth observation.
+=== Key Papers ===
+* XXZhu 2025 [[https://www.nature.com/articles/s43247-025-03127-x|"On the Foundations of Earth Foundation Models"]] – ''Nature Computational Science''
+* CFBrown 2025 [[https://arxiv.org/abs/2507.22291|"AlphaEarth Foundations"]]
+* KKlemmer 2023 [[https://arxiv.org/abs/2311.17179|"SatCLIP: Global Location Embeddings with Satellite Imagery"]]
+== 2. Datasets ==
+Large-scale, open-access datasets play a central role in training and evaluating Earth foundation models.
+=== Datasets ===
+* '''EuroSAT''' – [[https://zenodo.org/records/7711810|Zenodo]]
+*: Land use classification dataset using Sentinel-2 satellite data.
+* '''EuroCrops''' – [[https://pmc.ncbi.nlm.nih.gov/articles/PMC10495462/|PMC 10495462]]
+*: Crop type mapping dataset for Europe.
+* '''National Land Cover Database (NLCD)''' – [[https://www.ingentaconnect.com/content/asprs/pers/2004/00000070/00000007/art00005;jsessionid=2awgc0agiboi4.x-ic-live-02|Photogrammetric Engineering & Remote Sensing]] (2001)
+*: USA land cover classes.
+* '''SSL4EO-S12''' – [[https://github.com/zhu-xlab/SSL4EO-S12|IEEE Geoscience and Remote Sensing]] (2023)
+*: Multimodal, multitemporal dataset for self-supervised learning.
+* '''Copernicus-Pretrain''' – [[https://github.com/zhu-xlab/Copernicus-FM|IEEE Geoscience and Remote Sensing]] (2023)
+*: An extension of the SSL4EO-S12 dataset to all major Sentinel missions (S1-S5P).
+== 3. Models & Methods ==
+These include both classical and cutting-edge machine learning approaches used in building Earth foundation models.
+=== Core Methods ===
+* '''SatCLIP''' – [[https://arxiv.org/abs/2311.17179|AAAI 2025]] etc.
+*: Vision-language model for global location representations.
+* '''MMEarth''' – [[https://arxiv.org/abs/2405.02771|EU/CV 2024]]
+*: Multimodal pretext tasks for geospatial representation learning.
+* '''ResNet''' – [[https://viso.ai/deep-learning/resnet-residual-neural-network/|KHe IEEE/CV 2016]]
+*: Baseline CNN architecture widely used in EO.
+* '''ConvNeXt V2''' – [[https://github.com/facebookresearch/ConvNeXt-V2|Woo et al., IEEE/CVF 2023]]
+*: Efficient ConvNet architecture using masked autoencoders (MAE).
+* '''DINO, DINOv2, DINOv3''' – [[https://dinov3.org/|INRIA 2021–2023, META]]
+*: Vision transformers with self-supervised learning capabilities.
+* '''MAE (Masked Autoencoders)''' – [[https://arxiv.org/abs/2111.06377|IEEE/CVF 2021]]
+*: Self-supervised learning for vision transformers.
+=== Distillation & Advanced Approaches ===
+* '''Distillation methods''' – Transfer knowledge from large models.
+* '''Neural plasticity-inspired models''' – [[https://arxiv.org/|ZXiong, arXiv 2024]]: Inspired by biological learning mechanisms.
+* '''Multi-label guided soft contrastive learning''' – [[https://ieeexplore.ieee.org/|YWang, IEEE TGRS, 2024]].
+* '''Barlow Twins''' – [[https://arxiv.org/|Zbontar et al., arXiv 2021]]: Method for learning representations without contrastive loss.
+* '''Continual Barlow Twins''' – [[https://ieeexplore.ieee.org/|IEEE JSTARS, 2023]]: Extends Barlow Twins to continual learning in EO segmentation.
+== 4. Tools & Benchmarks ==
+These are software systems and frameworks that support development, evaluation, or deployment of EO AI models.
+=== Tools ===
+* '''TorchGeo''' – [[https://www.osgeo.org/projects/torchgeo/|AJStewart ACM 2025]]
+*: PyTorch library for geospatial deep learning.
+* '''NeuCo-Bench''' – [[https://arxiv.org/html/2510.17914v1|RVinge, arXiv 2025]]
+*: Benchmarking framework for neural embeddings in Earth observation.
+* '''GeoINRID''' – [[https://github.com/arjunarao619/GeoINRID|GitHub: arjunarao619/GeoINRID]]
+*: Geospatial inference and representation learning toolkit.
+=== Challenges ===
+* '''Embed2Scale Challenge''' – [[https://research.ibm.com/publications/the-2025-cvpr-earthvision-data-challenge-by-embed2scale|CVPR CAlbrecht 2025]]
+*: Large-scale Earth vision challenge focused on scale-aware embeddings.
+* '''TerraMind Blue-Sky Challenge''' – [[https://arxiv.org/|JJakubik, arXiv 2025]]
+*: Generative modeling for Earth observation.
+== 5. Key Themes & Trends ==
+# '''Foundation Models''': TorchGeo now includes data loaders designed for search/retrieval (Clay, Major TOM, Earth Index), and for dense prediction tasks like land cover mapping (Copernicus, Presto, Tessera, Google). TorchGeo allows us to enable fair, side-by-side benchmarking of different embedding models on the same downstream tasks, forming the basis for future experiments. Projects are encouraged to strengthen and improve explainability.
+#* '''Major TOM Notes''': Major TOM embeddings are not (yet) really product-oriented and are aimed with a similar purpose to the MT Core datasets - to make it easier to experiment and benchmark model outputs (hence, unlike TESSERA and AEF which came a few months after, MT embeddings do not have consistent or aggregated temporal scope). We haven't had enough time to finish off the preprint, but my current plan is to provide a simple MT Embedding benchmark at this year's EGU and integrate that into the arxiv pre-print. --Miko
+#* '''Earth Index / Earth Genome''': Use the Earth Index application (earthindex.ai) for non-technical users to use the embeddings we published on source.coop. Users of the web app (non-technical journalists, indigenous communities/allies, NGOs) have been our main focus. Users of the source.coop embeddings have generally been more technical folks interested in exploring/innovating in what's possible --BenStrong
+#* '''Clay''': Clay and Presto offer documented tutorials on generating new embeddings with their models. In CLAY, the encoder receives unmasked patches, latitude-longitude data, and timestep information. Notably, the last 2 embeddings from the encoder specifically represent the latitude-longitude and timestep embeddings.
+# '''Self-Supervised Learning (SSL)''':
+# '''Multimodal Integration''':
+# '''Open Data & Tools''': Open-source projects (e.g., TorchGeo, Copernicus-Embed) and public datasets (EuroSAT, EuroCrops) are crucial for reproducibility and democratization of EO AI. Projects are encouraged to increase Input Data Diversity, and to adopt cloud-native data formats for geospatial data.
+# '''Benchmarking''': Projects are encouraged to standardize in benchmarking. Benchmarks including NeuCo-Bench and Embed2Scale.
+== Research Directions ==
+* '''Unified Earth Foundation Models''':
+* '''Interpretability in EO AI''': Exploring how these embeddings can be interpreted by domain experts.
+* '''Ethics and Bias''': Investigating fairness and bias in global EO models trained on unevenly distributed data.
+* '''Edge Deployment''': Making these large foundation models deployable on resource-constrained platforms (e.g., for field use).
+##-----------------------------------------------------
 Notes on
     '''EARTH EMBEDDINGS AS PRODUCTS: TAXONOMY, ECOSYSTEM, AND STANDARDIZED ACCESS'''