TorchGeo Code Provenance
Code Provenance Review Process
TorchGeo project refer : https://www.osgeo.org/about/committees/incubation/graduation/
Document Status
| Version: | 0.2 |
| Status: | Updated |
| Last Edited: | 08 Jan 2026 |
Purpose
Show to "reasonable comfort level" that TorchGeo does not have improperly contributed code, and that the code is all under the Project license.
Notice: It is not the goal to be able to prove that every source file, and every contribution to those files, was contributed properly. The onus is not on the OSGeo Incubation Committee to prove there are no problems. However, OSGeo Incubation Committee does need to ensure that OSGeo does not officially endorse code with provenance issues that could have identified and corrected with a reasonable effort.
TorchGeo Provenance Review
Project History
TorchGeo started as an internship project at Microsoft's AI for Good Lab in 2021, with continued support and funding from the University of Illinois Urbana-Champaign and the National Center for Supercomputing Applications. Since then, TorchGeo has been adopted by a diverse community of researchers from academia, industry, and government research institutions who have contributed to the success of the library. In 2025, we founded a new TorchGeo Organization to manage the TorchGeo Project, with the support of members from the above organizations.
Key milestones:
- 2021-05-11: Initial commit (2330aad) by Adam J. Stewart
- 2021-2024: Organic growth to 120+ contributors
- 2025: Formation of TorchGeo Organization with formal governance structure
- 2026: OSGeo incubation application submitted
Libraries
All dependencies are external (not bundled in source tree). All licenses are OSI-approved and compatible with MIT.
| Library | License | Compatible? | In Core? | Issues? | Ok? | Comments |
|---|---|---|---|---|---|---|
| einops | MIT | Yes | External | - | Yes | Tensor operations |
| geopandas | BSD-3-Clause | Yes | External | - | Yes | Geospatial dataframes |
| jsonargparse | MIT | Yes | External | - | Yes | CLI argument parsing |
| kornia | Apache-2.0 | Yes | External | - | Yes | Computer vision ops |
| lightly | MIT | Yes | External | - | Yes | Self-supervised learning |
| lightning | Apache-2.0 | Yes | External | - | Yes | Training framework |
| matplotlib | PSF | Yes | External | - | Yes | Plotting |
| numpy | BSD-3-Clause | Yes | External | - | Yes | Numerical computing |
| pandas | BSD-3-Clause | Yes | External | - | Yes | Data manipulation |
| pillow | HPND | Yes | External | - | Yes | Image I/O |
| pyproj | MIT | Yes | External | - | Yes | Cartographic projections |
| rasterio | BSD-3-Clause | Yes | External | - | Yes | Geospatial raster I/O |
| segmentation-models-pytorch | MIT | Yes | External | - | Yes | Segmentation architectures |
| shapely | BSD-3-Clause | Yes | External | - | Yes | Geometric operations |
| timm | Apache-2.0 | Yes | External | - | Yes | Vision model zoo |
| torch | BSD-3-Clause | Yes | External | - | Yes | PyTorch core |
| torchmetrics | Apache-2.0 | Yes | External | - | Yes | Metrics computation |
| torchvision | BSD-3-Clause | Yes | External | - | Yes | Vision utilities |
Copyright Holders
| Copyright Holder | Contact | Issues | Comments |
|---|---|---|---|
| TorchGeo Contributors | N/A (collective) | None | Per LICENSE file |
The MIT License file states: "Copyright (c) TorchGeo Contributors."
This is a collective copyright model - individual contributors retain copyright but license under MIT.
Committers
Total human contributors: 122
Total commits: 2778 (excluding bots)
Top contributors (PSC members marked with *):
| Committer | Affiliation | Agreement Signed? | |
|---|---|---|---|
| Adam J. Stewart * | ajstewart426@gmail.com | TUM / UIUC | TBD |
| Caleb Robinson * | calebrob6@gmail.com | Microsoft AI for Good | TBD |
| Isaac Corley * | isaac.corley@proton.me | Wherobots | TBD |
| Nils Lehmann * | nils.lehmann24@gmail.com | TUM | TBD |
| Ashwin Nair * | ashnair0007@gmail.com | Space42 | TBD |
| Anthony Ortiz * | Anthony.Ortiz@microsoft.com | Microsoft | TBD |
| Robin Cole | robmarkcole@gmail.com | - | TBD |
| Yi Wang | wangyi111@users.noreply.github.com | - | TBD |
| Yi-Chia Chang | yichiac@users.noreply.github.com | - | TBD |
| Burak | burakekim@users.noreply.github.com | - | TBD |
| Blaz Rolih | blaz-r@users.noreply.github.com | - | TBD |
| Maciej Kilian | iejMac@users.noreply.github.com | - | TBD |
| Qiusheng Wu | giswqs@gmail.com | - | TBD |
Note: 122 total contributors. Full list available via git shortlog -sne --all.
Source Files
Total Python files: 557
Files with copyright header: 557 (100%) - after PR #3262 merged
Standard header format:
# Copyright (c) TorchGeo Contributors. All rights reserved. # Licensed under the MIT License.
| File | License | Copyright Holder | Header? | Issues? | Ok? | Comments |
|---|---|---|---|---|---|---|
| docs/user/metrics/update_timeline.py | MIT | TorchGeo Contributors | Yes | None | Yes | Fixed in PR #3262 |
| tests/data/landcoverai/split.py | MIT | TorchGeo Contributors | Yes | None | Yes | Fixed in PR #3262 |
Note: PR https://github.com/torchgeo/torchgeo/pull/3262 adds copyright headers to the 2 files that were missing them.
Further Issues
Resolved:
- Copyright headers added to 2 missing files via PR #3262: https://github.com/torchgeo/torchgeo/pull/3262
Pending:
- Contributor agreement status needs confirmation from PSC members
TorchGeo Application Questionnaire
https://trac.osgeo.org/osgeo/wiki/TorchGeo
Process
The Code Provenance Review is separated into two sections, each with a slightly different process:
- Create a Provenance Review Document by copying the Provenance Review Template.
- Complete the Library/Component Review Process.
- Complete the Code Copyright Review.
- Submit the completed Provenance Review Document to the Project Mentor for review.
Library/Component Review Process
Review completed: 08 Jan 2026
1. Components in Source Tree
Finding: None TorchGeo does not bundle any third-party libraries in its source tree. Unlike projects like GDAL (which includes copies of libtiff, libjpeg, etc.), TorchGeo treats all dependencies as external packages installed via pip/conda.
- No
vendor/directory - No
third_party/directory - No
external/directory - No embedded library copies
This is the recommended approach and simplifies license management.
2. External Dependencies
Total required dependencies: 18 All licenses are OSI-approved and compatible with MIT.
| Package | License | Compatible? | Potentially Problematic? |
|---|---|---|---|
| einops | MIT | Yes | No |
| geopandas | BSD-3-Clause | Yes | No |
| jsonargparse | MIT | Yes | No |
| kornia | Apache-2.0 | Yes | No |
| lightly | MIT | Yes | No |
| lightning | Apache-2.0 | Yes | No |
| matplotlib | PSF (BSD-style) | Yes | No |
| numpy | BSD-3-Clause | Yes | No |
| pandas | BSD-3-Clause | Yes | No |
| pillow | HPND (permissive) | Yes | No |
| pyproj | MIT | Yes | No |
| rasterio | BSD-3-Clause | Yes | No |
| segmentation-models-pytorch | MIT | Yes | No |
| shapely | BSD-3-Clause | Yes | No |
| timm | Apache-2.0 | Yes | No |
| torch | BSD-3-Clause | Yes | No |
| torchmetrics | Apache-2.0 | Yes | No |
| torchvision | BSD-3-Clause | Yes | No |
3. License Distribution
| License Type | Count | Notes |
|---|---|---|
| MIT | 6 | Permissive, fully compatible |
| BSD-3-Clause | 7 | Permissive, fully compatible |
| Apache-2.0 | 4 | Permissive, fully compatible |
| PSF | 1 | Python Software Foundation, BSD-style |
| HPND | 1 | Historical Permission Notice and Disclaimer, permissive |
4. Potentially Problematic Licenses
Finding: None identified
- No GPL or LGPL dependencies (no copyleft)
- No AGPL dependencies
- No proprietary or non-free dependencies
- No "viral" license concerns
5. License Conflicts
Finding: None identified All dependency licenses are permissive and compatible with each other and with TorchGeo's MIT license. There are no conflicts analogous to GDAL's use of GPL'ed GRASS libraries.
6. Optional Dependencies
TorchGeo has optional dependencies for the datasets extra. All use permissive licenses:
| Package | License |
|---|---|
| h5py | BSD-3-Clause |
| laspy | BSD-2-Clause |
| netcdf4 | MIT |
| opencv-python | Apache-2.0 |
| pycocotools | BSD-2-Clause |
| rioxarray | Apache-2.0 |
| scikit-image | BSD-3-Clause |
| scipy | BSD-3-Clause |
| xarray | Apache-2.0 |
7. Conclusion
Overall assessment: No library/component issues identified
- No bundled code in source tree
- All 18 required dependencies use permissive OSI-approved licenses
- No GPL/copyleft dependencies
- No license conflicts
- Clean dependency model suitable for OSGeo incubation
Code Copyright Review
Review completed: 08 Jan 2026 Reviewer: Brian M Hamlin (Mentor), with assistance from automated tooling Scope: All 557 Python source files in the TorchGeo repository
1. License Headers
| Total Python files: | 557 |
| Files with standard header: | 557 (100%) |
| Files missing header: | 0 (after PR #3262) |
Standard header format:
# Copyright (c) TorchGeo Contributors. All rights reserved. # Licensed under the MIT License.
Finding: All source files include the standard license header. Two utility scripts (docs/user/metrics/update_timeline.py and tests/data/landcoverai/split.py) were identified as missing headers during this review and fixed via PR #3262.
2. License Consistency
Finding: All files are under the MIT License. No files use a different or conflicting license.
- No GPL, LGPL, or copyleft licenses found
- No Apache-only or BSD-only files found
- All code consistently uses MIT
3. Code Origins
Two files contain code derived from external sources. Both are properly attributed and both original sources use MIT License (compatible):
| File | Source | Source License | Compatible? | Attribution in File? |
|---|---|---|---|---|
torchgeo/models/btc.py |
blaz-r/BTC-change-detection | MIT | Yes | Yes |
torchgeo/models/croma.py |
antofuller/CROMA | MIT | Yes | Yes |
Finding: Both external code sources are MIT licensed, which is fully compatible with TorchGeo's MIT license. Attribution is properly documented in the source file headers.
4. Vendored/Bundled Code
Finding: No vendored or bundled third-party code exists in the source tree. All dependencies are external and installed via pip/conda.
- No
vendor/,third_party/, orexternal/directories - No embedded copies of libraries (unlike GDAL which bundles libtiff, libjpeg, etc.)
- All 18 dependencies are fetched externally at install time
5. Copyright Holders
| Copyright Holder | Type | Notes |
|---|---|---|
| TorchGeo Contributors | Collective | Single copyright holder for all files |
Finding: The project uses a collective copyright model. All files use "Copyright (c) TorchGeo Contributors" rather than individual contributor names. Implications for relicensing:
- Simplified model - no need to contact 122 individual contributors
- Contributors implicitly agree to collective copyright by submitting PRs
- NEEDS CLARIFICATION: Does the project have a Contributor License Agreement (CLA) or Developer Certificate of Origin (DCO)?
6. Outstanding Issues
Resolved:
- Copyright headers added to 2 missing files: PR #3262
Pending:
- NEEDS PSC INPUT: Confirm whether a CLA/DCO is in place or planned
- NEEDS PSC INPUT: Contributor agreement status for PSC members (currently all marked TBD)
7. Conclusion
Overall assessment: No provenance issues identified The TorchGeo codebase demonstrates good copyright hygiene:
- 100% of source files have proper license headers
- All code is consistently MIT licensed
- External code derivations are properly attributed and license-compatible
- No vendored code complicates the license picture
- Collective copyright model simplifies governance
The codebase is ready for OSGeo incubation from a code provenance perspective, pending confirmation of contributor agreement procedures from the PSC.
Frequently Asked Questions
- Do we need to ensure there is a copyright and licence headers in every file?
- Projects are expected to embed a copyright and license header in every source file (.cpp and .h for instance). Machine generated files (such as lex or yacc output for instance) does not require a header if it is inconvenient to embed one, but then an effort should be made to embed a header in the original source file (ie. .l or .y). There is no need to embed copyright and license headers in build files (ie. Makefiles, Visual Studio project files), nor in small documentation fragments (like READMEs). Copyright/license headers should be included in substantial bodies of documentation if it can be done in such a way that it doesn't interfere with the document.
- Do I really need to embed the whole license agreement in every file?
- No, it is sufficient to refer to a single copy of the license agreement for the project, but each file should include an indication of what the license is, and the location of the full license document. For example:
/* * Copyright (C) 2004-2005 Autodesk, Inc. * * This library is free software; you can redistribute it and/or * modify it under the terms of version 2.1 of the GNU Lesser * General Public License as published by the Free Software Foundation. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library; if not, write to the Free Software * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA */
or:
* COPYRIGHT: (C) 2001 by the GRASS Development Team * * This program is free software under the GNU General Public * License (>=v2). Read the file COPYING that comes with GRASS * for details.
- Can I just write a script to check everything?
- No. The point of a human review is to ensure that human eyes have looked at the header of each file, looking for any anomalies. For instance it is not uncommon to have additional messages written in headers about the original source of particular code, but not necessarily in a way that a script could recognise.
- But I just did a review similar to this a few months ago, do I really need to do it again?
- If you believe you have already met or exceeded the rigor described in this review document, then you can present the results of your previous analysis along with an explanation of why you think it still applies. You will need to manually review additions since that earlier review. It will be up to the Incubator (and ultimately the foundation board) to determine if they are satisfied with any review.
- Do I need to track down every contributor and ask for permission to use their code?
- No. As long as code was knowingly provided by contributors to the project while it was under a clear licensing policy then it is reasonable to assume the project licensing arrangements are accepted unless there is some reason to have doubts. If code was brought into the project from outside sources under ambiguous circumstances then it may be necessary to contact the copyright holder for a legal release, or at least a statement of permission.
- Some project code was adapted from other sources with ambiguous copyrights, what do I do?
- First, document such concerns in the review document and within the source files themselves. Second, try to obtain a release to use the provide code. An email statement of permission should be sufficient, but pertinant details of such an email should be included in the source file in questions (eg. date, identification of releasing person, text of the release). If no release is possible, but the code isn't very necessary, consider removing or replacing it to simplify things. Otherwise the issue will need to be discussed in the incubator to assess risk.
- Should I be looking for patent problems?
- No. However, if the code review turns up notes about patent issues - or if there are known patent issues for the project, then these should be noted in the review document. Generally speaking a proper patent review is very difficult and may increase, rather than decrease, legal liability.
- Should I be looking for trademark conflicts?
- Possibly. It may be prudent to check for trademark conflicts for the project name, or the name of public packages within the project. If issues are found they should be noted in the review document.
--