Difference between revisions of "Sprint bdfs25"
| (30 intermediate revisions by the same user not shown) | |||
| Line 2: | Line 2: | ||
'''TorchGeo''' an OSGeo [https://www.osgeo.org/projects/torchgeo/ Project] [https://github.com/torchgeo/torchgeo CODE] [https://torchgeo.readthedocs.io/ DOCS] | '''TorchGeo''' an OSGeo [https://www.osgeo.org/projects/torchgeo/ Project] [https://github.com/torchgeo/torchgeo CODE] [https://torchgeo.readthedocs.io/ DOCS] | ||
| + | |||
| + | datasets [https://torchgeo.readthedocs.io/en/latest/api/datasets.html LINK], samplers, transforms, and pre-trained models for geospatial data | ||
Additional Topics: | Additional Topics: | ||
| Line 10: | Line 12: | ||
'''ml4earth.de''' [https://ml4earth.de/workshop_2025/ workshop] featured a keynote by Prof. Dr. Xiaoxiang Zhu from TUM Germany, also home to the creator of TorchGeo Adam Stewart PhD [https://www.asg.ed.tum.de/sipeo/team/dr-adam-j-stewart/ POSTDOC]. Applications of artificial intelligence (AI) in Earth observation, with a focus on machine learning (ML) approaches for remote sensing. | '''ml4earth.de''' [https://ml4earth.de/workshop_2025/ workshop] featured a keynote by Prof. Dr. Xiaoxiang Zhu from TUM Germany, also home to the creator of TorchGeo Adam Stewart PhD [https://www.asg.ed.tum.de/sipeo/team/dr-adam-j-stewart/ POSTDOC]. Applications of artificial intelligence (AI) in Earth observation, with a focus on machine learning (ML) approaches for remote sensing. | ||
| + | |||
| + | TerraBytes Canada [https://terrabytes-workshop.github.io/ LINK] | ||
| + | |||
| + | Geometa Lab [https://www.giswiki.ch/Agenda LINK] | ||
| Line 16: | Line 22: | ||
[[File:Pangaea geofmbenchmark.png|thumb]] | [[File:Pangaea geofmbenchmark.png|thumb]] | ||
| − | a standardized evaluation protocol that covers a diverse set of datasets, tasks, resolutions, sensor modalities, and temporalities | + | a standardized evaluation protocol that covers a diverse set of datasets, tasks, resolutions, sensor modalities, and temporalities, establishing a robust and widely applicable benchmark for Geospatial Foundation Models (GFMs). [https://arxiv.org/abs/2412.04204v2 PDF] |
https://bpa.st/BTQAE ## code updates | https://bpa.st/BTQAE ## code updates | ||
| Line 26: | Line 32: | ||
PANGAEA [https://github.com/VMarsocci/pangaea-bench benchmark] shows that specialized, not-CNN and not-ViT , machine learning models can perform better than various Foundation Models for remote sensing landuse / landcover analysis and other specific classification tasks. | PANGAEA [https://github.com/VMarsocci/pangaea-bench benchmark] shows that specialized, not-CNN and not-ViT , machine learning models can perform better than various Foundation Models for remote sensing landuse / landcover analysis and other specific classification tasks. | ||
| − | Supervised learning, XGBoost / Catboost or various Random Forest learning can perform better than ViT variations, CNN and related approaches. | + | Supervised learning, XGBoost / Catboost or various Random Forest learning can perform better than ViT variations, CNN and related approaches on the six key use cases as stated by FAST-EO : weather and climate disaster analysis, methane leak detection, forest above-ground biomass change, soil property estimation, semantic land cover change detection, and monitoring the expansion of mining fields into farmlands. |
| + | |||
| + | Thoughts on Geospatial Foundation Models [https://arxiv.org/pdf/2405.04285 PDF] Zhu, Stewart, et al 2024 | ||
this [https://huggingface.co/ibm-esa-geospatial/TerraMind-1.0-base TerraMind] mixed model is experimental, | this [https://huggingface.co/ibm-esa-geospatial/TerraMind-1.0-base TerraMind] mixed model is experimental, | ||
| − | + | [https://github.com/swiss-territorial-data-lab/proj-vit swiss-territorial-data-lab] [https://huggingface.co/datasets/heig-vd-geo/M3DRS DATA] https://stdl.ch/ | |
A sparse matrix math tutorial [https://iclr-blogposts.github.io/2025/blog/sparse-autodiff/ LINK] | A sparse matrix math tutorial [https://iclr-blogposts.github.io/2025/blog/sparse-autodiff/ LINK] | ||
| Line 38: | Line 46: | ||
Model building libraries [https://github.com/opengeos/geoai/tree/main geoAI] by _giswqs_ Qiusheng Wu, UTenn [https://opengeoai.org/#statement-of-need DOCS] | Model building libraries [https://github.com/opengeos/geoai/tree/main geoAI] by _giswqs_ Qiusheng Wu, UTenn [https://opengeoai.org/#statement-of-need DOCS] | ||
| − | processsing toolkit [https://github.com/nasa-nccs-hpda/pytorch-caney pytorch-caney] | + | processsing toolkit [https://github.com/nasa-nccs-hpda/pytorch-caney pytorch-caney] [https://nasa-nccs-hpda.github.io/pytorch-caney/latest/readme.html#objectives DOCS] |
| + | MOSAIKS CodeCapsule [https://codeocean.com/capsule/6456296/tree/v2 LINK] | ||
| + | |||
| + | Satlas [https://satlas-pretrain.allen.ai/ LINK] a platform for visualizing and downloading global geospatial data products generated by AI using satellite images. [https://satlas.allen.ai/map DEMO] YNews [https://news.ycombinator.com/item?id=37387556 REVIEW] | ||
| + | |||
| + | DeepLearning and OBIA [https://arxiv.org/pdf/2408.01607? PDF] sponsored by State Key Labs [https://en.wikipedia.org/wiki/State_Key_Laboratories LINK] | ||
| + | |||
| + | GOOGLE_SATELLITE_EMBEDDING_V1 [https://deepmind.google/discover/blog/alphaearth-foundations-helps-map-our-planet-in-unprecedented-detail/ BLOG] [https://medium.com/google-earth/ai-powered-pixels-introducing-googles-satellite-embedding-dataset-31744c1f4650 MED] [https://developers.google.com/earth-engine/datasets/catalog/GOOGLE_SATELLITE_EMBEDDING_V1_ANNUAL LINK] related [https://christopherren.substack.com/p/embedding-fields-forever LINK] | ||
| + | |||
| + | @inproceedings{brown2024learned, | ||
| + | title={Learned embedding fields for multi-source, multi-temporal earth observation imagery}, | ||
| + | author={Brown, Christopher and Kazmierski, Michal and Rucklidge, William and Pasquarella, Valerie and Shelhamer, Evan}, | ||
| + | booktitle={ICLR Workshop on Machine Learning for Remote Sensing (ML4RS)}, | ||
| + | year={2024} | ||
| + | } | ||
4. '''Data availability''' | 4. '''Data availability''' | ||
Promotion of public datasets for Earth observation research. | Promotion of public datasets for Earth observation research. | ||
| + | |||
| + | Global Forest Watch edu [https://glad.umd.edu/projects/global-forest-watch LINK] | ||
US West Coast -- team MSFT, Planet Labs and The Nature Conservancy .. [https://www.microsoft.com/en-us/research/wp-content/uploads/2025/03/Global-Renewables-Watch_Caleb-Robinson_2025.pdf REF] [https://reatlas42216storage.blob.core.windows.net/public/wind_all_2024q2_3_11_2025.gpkg DATA0] | US West Coast -- team MSFT, Planet Labs and The Nature Conservancy .. [https://www.microsoft.com/en-us/research/wp-content/uploads/2025/03/Global-Renewables-Watch_Caleb-Robinson_2025.pdf REF] [https://reatlas42216storage.blob.core.windows.net/public/wind_all_2024q2_3_11_2025.gpkg DATA0] | ||
| Line 53: | Line 77: | ||
The project involves a consortium of partners including Forschungszentrum Jülich, KP Labs, and IBM Research, who are collaborating on six key use cases: weather and climate disaster analysis, methane leak detection, forest above-ground biomass change, soil property estimation, semantic land cover change detection, and monitoring the expansion of mining fields into farmlands. The project is committed to open science, planning to release model weights, configurations, datasets, and source code under a free and permissive Apache-2 license, accessible via platforms like GitHub, HuggingFace, and the SpatioTemporal Asset Catalog (STAC). | The project involves a consortium of partners including Forschungszentrum Jülich, KP Labs, and IBM Research, who are collaborating on six key use cases: weather and climate disaster analysis, methane leak detection, forest above-ground biomass change, soil property estimation, semantic land cover change detection, and monitoring the expansion of mining fields into farmlands. The project is committed to open science, planning to release model weights, configurations, datasets, and source code under a free and permissive Apache-2 license, accessible via platforms like GitHub, HuggingFace, and the SpatioTemporal Asset Catalog (STAC). | ||
| − | Other public training [https://x-ytong.github.io/project/Five-Billion-Pixels.html DATA] for China interior, by Prof. Dr. Xiou xiang Zhu .. who is very much in Germany and is the boss of the department with the `torchGeo` | + | Other public training [https://x-ytong.github.io/project/Five-Billion-Pixels.html DATA] for China interior, by Prof. Dr. Xiou xiang Zhu .. who is very much in Germany and is the boss of the department with the `torchGeo` project founder |
gdal containers [https://github.com/OSGeo/gdal/pkgs/container/gdal LINK] | gdal containers [https://github.com/OSGeo/gdal/pkgs/container/gdal LINK] | ||
| Line 68: | Line 92: | ||
[https://eumap.readthedocs.io/en/latest/ EUMap] client libraries | [https://eumap.readthedocs.io/en/latest/ EUMap] client libraries | ||
| + | |||
| + | Open Earth Monitor [https://cordis.europa.eu/project/id/101059548/results EU] [https://www.youtube.com/@OpenGeoHubFoundation YouTube] Channel | ||
| Line 75: | Line 101: | ||
[https://openlandmap.org/ OpenLandMap] | [https://openlandmap.org/ OpenLandMap] | ||
| + | |||
| + | |||
'''Key points and takeaways''': | '''Key points and takeaways''': | ||
| Line 80: | Line 108: | ||
'''Specialized ML models''': Traditional ML methods can outperform trendy AI approaches in remote sensing tasks.'''Public data availability''': Public datasets are essential for Earth observation research, enabling collaboration and innovation. '''Collaboration opportunities''': The conversation likely touched upon the potential for international collaborations across different countries and regions. '''Geospatial data infrastructure''': Standardized data formats and efficient data access mechanisms are crucial for geospatial research. | '''Specialized ML models''': Traditional ML methods can outperform trendy AI approaches in remote sensing tasks.'''Public data availability''': Public datasets are essential for Earth observation research, enabling collaboration and innovation. '''Collaboration opportunities''': The conversation likely touched upon the potential for international collaborations across different countries and regions. '''Geospatial data infrastructure''': Standardized data formats and efficient data access mechanisms are crucial for geospatial research. | ||
| − | Preparation for Big Data from Space #osgeo code sprint | + | Preparation for #bids25 Big Data from Space #osgeo + Pangeo code sprint |
The participants' discussion is a preparation activity for an upcoming code sprint, where they will work together to develop innovative solutions using remote sensing and Earth observation datasets. A wiki page summarizing these ideas could be a valuable output from this collaboration. | The participants' discussion is a preparation activity for an upcoming code sprint, where they will work together to develop innovative solutions using remote sensing and Earth observation datasets. A wiki page summarizing these ideas could be a valuable output from this collaboration. | ||
| + | |||
| + | A Path for Science‑ and Evidence‑based AI [https://understanding-ai-safety.org/ Policy] | ||
Overall, the conversation highlights the intersection of geospatial technology, machine learning, and open science initiatives in Earth observation, emphasizing the importance of data availability, specialized ML models, and geospatial data infrastructure for advancing research and innovation in this field. | Overall, the conversation highlights the intersection of geospatial technology, machine learning, and open science initiatives in Earth observation, emphasizing the importance of data availability, specialized ML models, and geospatial data infrastructure for advancing research and innovation in this field. | ||
Latest revision as of 14:11, 1 October 2025
TorchGeo an OSGeo Project CODE DOCS
datasets LINK, samplers, transforms, and pre-trained models for geospatial data
Additional Topics:
1. Workshops
Berkeley Climate AI Day LINK
ml4earth.de workshop featured a keynote by Prof. Dr. Xiaoxiang Zhu from TUM Germany, also home to the creator of TorchGeo Adam Stewart PhD POSTDOC. Applications of artificial intelligence (AI) in Earth observation, with a focus on machine learning (ML) approaches for remote sensing.
TerraBytes Canada LINK
Geometa Lab LINK
2. PANGAEA project
a standardized evaluation protocol that covers a diverse set of datasets, tasks, resolutions, sensor modalities, and temporalities, establishing a robust and widely applicable benchmark for Geospatial Foundation Models (GFMs). PDF
https://bpa.st/BTQAE ## code updates
https://eotdl.com/blog/pangaea
3. Machine learning approaches
PANGAEA benchmark shows that specialized, not-CNN and not-ViT , machine learning models can perform better than various Foundation Models for remote sensing landuse / landcover analysis and other specific classification tasks. Supervised learning, XGBoost / Catboost or various Random Forest learning can perform better than ViT variations, CNN and related approaches on the six key use cases as stated by FAST-EO : weather and climate disaster analysis, methane leak detection, forest above-ground biomass change, soil property estimation, semantic land cover change detection, and monitoring the expansion of mining fields into farmlands.
Thoughts on Geospatial Foundation Models PDF Zhu, Stewart, et al 2024
this TerraMind mixed model is experimental,
swiss-territorial-data-lab DATA https://stdl.ch/
A sparse matrix math tutorial LINK
A Geospatial Foundation Model Prithvi-EO-2.0
Model building libraries geoAI by _giswqs_ Qiusheng Wu, UTenn DOCS
processsing toolkit pytorch-caney DOCS
MOSAIKS CodeCapsule LINK
Satlas LINK a platform for visualizing and downloading global geospatial data products generated by AI using satellite images. DEMO YNews REVIEW
DeepLearning and OBIA PDF sponsored by State Key Labs LINK
GOOGLE_SATELLITE_EMBEDDING_V1 BLOG MED LINK related LINK
@inproceedings{brown2024learned,
title={Learned embedding fields for multi-source, multi-temporal earth observation imagery},
author={Brown, Christopher and Kazmierski, Michal and Rucklidge, William and Pasquarella, Valerie and Shelhamer, Evan},
booktitle={ICLR Workshop on Machine Learning for Remote Sensing (ML4RS)},
year={2024}
}
4. Data availability
Promotion of public datasets for Earth observation research.
Global Forest Watch edu LINK
US West Coast -- team MSFT, Planet Labs and The Nature Conservancy .. REF DATA0
TerraMesh, ESA (European Space Agency) -- TerraMesh , part of the FAST‑EO project funded by the European Space Agency Φ‑Lab (contract #4000143501/23/I‑DT).
The FAST-EO project, officially launched on February 5, 2024, is an initiative funded by the European Space Agency (ESA) and led by the German Aerospace Center (DLR). Its primary goal is to advance AI Foundation Models (FMs) for Earth Observation (EO) by exploring large multimodal foundation models through unsupervised and self-supervised learning to address downstream tasks. The project involves a consortium of partners including Forschungszentrum Jülich, KP Labs, and IBM Research, who are collaborating on six key use cases: weather and climate disaster analysis, methane leak detection, forest above-ground biomass change, soil property estimation, semantic land cover change detection, and monitoring the expansion of mining fields into farmlands. The project is committed to open science, planning to release model weights, configurations, datasets, and source code under a free and permissive Apache-2 license, accessible via platforms like GitHub, HuggingFace, and the SpatioTemporal Asset Catalog (STAC).
Other public training DATA for China interior, by Prof. Dr. Xiou xiang Zhu .. who is very much in Germany and is the boss of the department with the `torchGeo` project founder
gdal containers LINK
opendatacube containers LINK
http://dataspace.copernicus.eu/ https://eotdl.com
https://github.com/opendatacube
EUMap client libraries
Open Earth Monitor EU YouTube Channel
5. Geospatial data infrastructure
standardized data formats and efficient data access mechanisms such as STAC (SpatioTemporal Asset Catalog) and OSGeoLive
Key points and takeaways:
Specialized ML models: Traditional ML methods can outperform trendy AI approaches in remote sensing tasks.Public data availability: Public datasets are essential for Earth observation research, enabling collaboration and innovation. Collaboration opportunities: The conversation likely touched upon the potential for international collaborations across different countries and regions. Geospatial data infrastructure: Standardized data formats and efficient data access mechanisms are crucial for geospatial research.
Preparation for #bids25 Big Data from Space #osgeo + Pangeo code sprint
The participants' discussion is a preparation activity for an upcoming code sprint, where they will work together to develop innovative solutions using remote sensing and Earth observation datasets. A wiki page summarizing these ideas could be a valuable output from this collaboration.
A Path for Science‑ and Evidence‑based AI Policy
Overall, the conversation highlights the intersection of geospatial technology, machine learning, and open science initiatives in Earth observation, emphasizing the importance of data availability, specialized ML models, and geospatial data infrastructure for advancing research and innovation in this field.

