TorchGeo Code Provenance

From OSGeo
Jump to navigation Jump to search

Code Provenance Review Process

TorchGeo project refer : https://www.osgeo.org/about/committees/incubation/graduation/

link : https://trac.osgeo.org/osgeo/wiki/TorchGeo

Document Status

Version: 0.4
Status: Updated
Last Edited: 09 Jan 2026

Purpose

Show to "reasonable comfort level" that TorchGeo does not have improperly contributed code, and that the code is all under the Project license.


Notice: It is not the goal to be able to prove that every source file, and every contribution to those files, was contributed properly. The onus is not on the OSGeo Incubation Committee to prove there are no problems. However, OSGeo Incubation Committee does need to ensure that OSGeo does not officially endorse code with provenance issues that could have identified and corrected with a reasonable effort.

TorchGeo Provenance Review

Project History

TorchGeo started as an internship project at Microsoft's AI for Good Lab in 2021, with continued support and funding from the University of Illinois Urbana-Champaign and the National Center for Supercomputing Applications. Since then, TorchGeo has been adopted by a diverse community of researchers from academia, industry, and government research institutions who have contributed to the success of the library. In 2025, we founded a new TorchGeo Organization to manage the TorchGeo Project, with the support of members from the above organizations.

Key milestones:

- 2021-05-11: Initial commit (2330aad) by Adam J. Stewart

- 2021-2024: Organic growth to 120+ contributors

- 2025: Formation of TorchGeo Organization with formal governance structure

- 2026: OSGeo incubation application submitted

Libraries

All dependencies are external (not bundled in source tree). All licenses are OSI-approved and compatible with MIT.

Library License Compatible? In Core? Issues? Ok? Comments
einops MIT Yes External - Yes Tensor operations
geopandas BSD-3-Clause Yes External - Yes Geospatial dataframes
jsonargparse MIT Yes External - Yes CLI argument parsing
lightly MIT Yes External - Yes Self-supervised learning
lightning Apache-2.0 Yes External - Yes Training framework
matplotlib PSF Yes External - Yes Plotting
numpy BSD-3-Clause Yes External - Yes Numerical computing
pandas BSD-3-Clause Yes External - Yes Data manipulation
pillow HPND * / MIT-CMU* Yes External - Yes Image I/O
pyproj MIT Yes External - Yes Cartographic projections
rasterio BSD-3-Clause Yes External - Yes Geospatial raster I/O
segmentation-models-pytorch MIT Yes External - Yes Segmentation architectures
shapely BSD-3-Clause Yes External - Yes Geometric operations
timm Apache-2.0 Yes External - Yes Vision model zoo
torch BSD-3-Clause Yes External - Yes PyTorch core
torchmetrics Apache-2.0 Yes External - Yes Metrics computation
torchvision BSD-3-Clause Yes External - Yes Vision utilities
*kornia Apache-2.0 Yes External - Yes Computer vision ops

Copyright Holders

Copyright Holder Contact Issues Comments
TorchGeo Contributors N/A (collective) None Per LICENSE file

The MIT License file states: "Copyright (c) TorchGeo Contributors."

This is a collective copyright model - individual contributors retain copyright but license under MIT. Historical other markings include:

> git gr Copyright | grep -v TorchGeo
torchgeo/models/convlstm.py:# Copyright (c) 2017 Andrea Palazzi
torchgeo/models/copernicusfm.py:    Adapted from https://github.com/bwconrad/flexivit. Copyright (c) 2023 Ben Conrad.
torchgeo/models/copernicusfm.py:    Adapted from https://github.com/bwconrad/flexivit. Copyright (c) 2023 Ben Conrad.
torchgeo/models/copernicusfm.py:    Copyright (c) Microsoft Corporation.
torchgeo/models/earthloc.py:    Adapted from https://github.com/gmberton/EarthLoc. Copyright (c) 2024 Gabriele Berton
torchgeo/models/earthloc.py:    Adapted from https://github.com/gmberton/EarthLoc. Copyright (c) 2024 Gabriele Berton
torchgeo/models/earthloc.py:    Adapted from https://github.com/gmberton/EarthLoc. Copyright (c) 2024 Gabriele Berton
torchgeo/models/ltae.py:# Copyright (c) 2020 VSainteuf (Vivien Sainte Fare Garnot)

Committers

Total human contributors: 122

Total commits: 2778 (excluding bots)

Top contributors (PSC members marked with *):

Committer Email Affiliation Agreement Signed?
Adam J. Stewart * ajstewart426@gmail.com TUM / UIUC TBD
Caleb Robinson * calebrob6@gmail.com Microsoft AI for Good TBD
Isaac Corley * isaac.corley@proton.me Wherobots TBD
Nils Lehmann * nils.lehmann24@gmail.com TUM TBD
Ashwin Nair * ashnair0007@gmail.com Space42 TBD
Anthony Ortiz * Anthony.Ortiz@microsoft.com Microsoft TBD
Robin Cole robmarkcole@gmail.com - TBD
Yi Wang wangyi111@users.noreply.github.com - TBD
Yi-Chia Chang yichiac@users.noreply.github.com - TBD
Burak burakekim@users.noreply.github.com - TBD
Blaz Rolih blaz-r@users.noreply.github.com - TBD
Maciej Kilian iejMac@users.noreply.github.com - TBD
Qiusheng Wu giswqs@gmail.com - TBD

Note: 122 total contributors. Full list available via git shortlog -sne --all.

Source Files

Total Python files: 557

Files with copyright header: 557 (100%) - after PR #3262 merged

Standard header format:

# Copyright (c) TorchGeo Contributors. All rights reserved.
# Licensed under the MIT License.
File License Copyright Holder Header? Issues? Ok? Comments
docs/user/metrics/update_timeline.py MIT TorchGeo Contributors Yes None Yes Fixed in PR #3262
tests/data/landcoverai/split.py MIT TorchGeo Contributors Yes None Yes Fixed in PR #3262

Note: PR https://github.com/torchgeo/torchgeo/pull/3262 adds copyright headers to the 2 files that were missing them.

Further Issues

Resolved:

Pending:

  • Contributor agreement status needs confirmation from PSC members


TorchGeo Application Questionnaire

See the main application questionnaire for additional project details: https://trac.osgeo.org/osgeo/wiki/TorchGeo

Process

This Code Provenance Review followed the OSGeo Incubation process:

Step Status Date Notes
1. Create Provenance Review Document Complete 06 Jan 2026 Document created from Provenance Review Template
2. Library/Component Review Complete 08 Jan 2026 18 dependencies reviewed, no issues found
3. Code Copyright Review Complete 08 Jan 2026 557 files reviewed, no issues found
4. Submit to Project Mentor Complete 08 Jan 2026 Brian M Hamlin (Mentor)

Review Summary

Components in source tree: None (all external dependencies)
Problematic licenses: None identified
License conflicts: None identified
Files reviewed: 557 Python files
Files with headers: 557 (100%)
External code derivations: 2 files (both MIT licensed, properly attributed)
Copyright model: Collective ("TorchGeo Contributors")

Outstanding Items

  • Pending merge: PR #3262 - adds copyright headers to 2 files
  • NEEDS PSC INPUT: CLA/DCO status confirmation. Does the project have a Contributor License Agreement (CLA) or Developer Certificate of Origin (DCO)?

ajstewart -- We used to have a CLA when we were under MSFT. We no longer have any CLA or DCO. We currently have no plans to add one. I know the Linux Foundation requires DCO. Does OSGeo have any requirements?

  • NEEDS PSC INPUT: Contributor agreement status for PSC members

Conclusion

The TorchGeo codebase passes the Code Provenance Review with no blocking issues identified. The project demonstrates good license hygiene and is ready for OSGeo incubation from a code provenance perspective.

Library/Component Review Process

Review completed: 08 Jan 2026

1. Components in Source Tree

Finding: None TorchGeo does not bundle any third-party libraries in its source tree. Unlike projects like GDAL (which includes copies of libtiff, libjpeg, etc.), TorchGeo treats all dependencies as external packages installed via pip/conda.

  • No vendor/ directory
  • No third_party/ directory
  • No external/ directory
  • No embedded library copies

This is the recommended approach and simplifies license management.

2. External Dependencies

Total required dependencies: 18 All licenses are OSI-approved and compatible with MIT.

Package License Compatible? Potentially Problematic?
einops MIT Yes No
geopandas BSD-3-Clause Yes No
jsonargparse MIT Yes No
kornia Apache-2.0 Yes No
lightly MIT Yes No
lightning Apache-2.0 Yes No
matplotlib PSF (BSD-style) Yes No
numpy BSD-3-Clause Yes No
pandas BSD-3-Clause Yes No
pillow HPND / MIT-CMU* Yes No
pyproj MIT Yes No
rasterio BSD-3-Clause Yes No
segmentation-models-pytorch MIT Yes No
shapely BSD-3-Clause Yes No
timm Apache-2.0 Yes No
torch BSD-3-Clause Yes No
torchmetrics Apache-2.0 Yes No
torchvision BSD-3-Clause Yes No

3. License Distribution

License Type Count Notes
MIT 6 Permissive, fully compatible
BSD-3-Clause 7 Permissive, fully compatible
Apache-2.0 4 Permissive, fully compatible
PSF 1 Python Software Foundation, BSD-style
HPND / MIT-CMU* 1 Historical Permission Notice and Disclaimer, permissive

4. Potentially Problematic Licenses

Finding: None identified

  • No GPL or LGPL dependencies (no copyleft)
  • No AGPL dependencies
  • No proprietary or non-free dependencies
  • No "viral" license concerns

5. License Conflicts

Finding: None identified All dependency licenses are permissive and compatible with each other and with TorchGeo's MIT license. There are no conflicts analogous to GDAL's use of GPL'ed GRASS libraries.

6. Optional Dependencies

TorchGeo has optional dependencies for the datasets extra. All use permissive licenses:

Package License
h5py BSD-3-Clause
laspy BSD-2-Clause
netcdf4 MIT
opencv-python Apache-2.0
pycocotools BSD-2-Clause
rioxarray Apache-2.0
scikit-image BSD-3-Clause
scipy BSD-3-Clause
xarray Apache-2.0

7. Conclusion

Overall assessment: No library/component issues identified

  • No bundled code in source tree
  • All 18 required dependencies use permissive OSI-approved licenses
  • No GPL/copyleft dependencies
  • No license conflicts
  • Clean dependency model suitable for OSGeo incubation

Code Copyright Review

Review completed: 08 Jan 2026 Reviewer: Brian M Hamlin (Mentor), with assistance from automated tooling Scope: All 557 Python source files in the TorchGeo repository

1. License Headers

Total Python files: 557
Files with standard header: 557 (100%)
Files missing header: 0 (after PR #3262)

Standard header format:

# Copyright (c) TorchGeo Contributors. All rights reserved.
# Licensed under the MIT License.

Finding: All source files include the standard license header. Two utility scripts (docs/user/metrics/update_timeline.py and tests/data/landcoverai/split.py) were identified as missing headers during this review and fixed via PR #3262.

2. License Consistency

Finding: All files are under the MIT License. No files use a different or conflicting license.

  • No GPL, LGPL, or copyleft licenses found
  • No Apache-only or BSD-only files found
  • All code consistently uses MIT

3. Code Origins

Two files contain code derived from external sources. Both are properly attributed and both original sources use MIT License (compatible):

File Source Source License Compatible? Attribution in File?
torchgeo/models/btc.py blaz-r/BTC-change-detection MIT Yes Yes
torchgeo/models/croma.py antofuller/CROMA MIT Yes Yes

Finding: Both external code sources are MIT licensed, which is fully compatible with TorchGeo's MIT license. Attribution is properly documented in the source file headers.

4. Vendored/Bundled Code

Finding: No vendored or bundled third-party code exists in the source tree. All dependencies are external and installed via pip/conda.

  • No vendor/, third_party/, or external/ directories
  • No embedded copies of libraries (unlike GDAL which bundles libtiff, libjpeg, etc.)
  • All 18 dependencies are fetched externally at install time

5. Copyright Holders

Copyright Holder Type Notes
TorchGeo Contributors Collective Single copyright holder for all files

Finding: The project uses a collective copyright model. All files use "Copyright (c) TorchGeo Contributors" rather than individual contributor names. Implications for relicensing:

  • Simplified model - no need to contact 122 individual contributors
  • Contributors implicitly agree to collective copyright by submitting PRs
  • NEEDS CLARIFICATION: Does the project have a Contributor License Agreement (CLA) or Developer Certificate of Origin (DCO)?

6. Outstanding Issues

Resolved:

  • Copyright headers added to 2 missing files: PR #3262

Pending:

  • NEEDS PSC INPUT: Confirm whether a CLA/DCO is in place or planned
  • NEEDS PSC INPUT: Contributor agreement status for PSC members (currently all marked TBD)

7. Conclusion

Overall assessment: No provenance issues identified. The TorchGeo codebase demonstrates good copyright hygiene:

  • 100% of source files have proper license headers
  • All code is consistently MIT licensed
  • External code derivations are properly attributed and license-compatible
  • No vendored code complicates the license picture
  • Collective copyright model simplifies governance

The codebase is ready for OSGeo incubation from a code provenance perspective, pending confirmation of contributor agreement procedures from the PSC.