Difference between revisions of "TorchGeo Code Provenance"

From OSGeo
Jump to navigation Jump to search
 
(64 intermediate revisions by 2 users not shown)
Line 3: Line 3:
 
TorchGeo project  
 
TorchGeo project  
 
'''refer : https://www.osgeo.org/about/committees/incubation/graduation/'''
 
'''refer : https://www.osgeo.org/about/committees/incubation/graduation/'''
 +
 +
'''Incubation application questionnaire : https://trac.osgeo.org/osgeo/wiki/TorchGeo'''
  
 
==Document Status==
 
==Document Status==
  
'''Version:''' 0.1
+
{| class="wikitable"
 
+
|-
'''Status:''' init
+
| '''Version:''' || 0.5
 
+
|-
'''Last Edited:''' 06 Jan 2026
+
| '''Status:''' || Updated
 +
|-
 +
| '''Last Edited:''' || 15 Jan 2026
 +
|}
  
 
==Purpose==
 
==Purpose==
  
Show to reasonable comfort level that TorchGeo  
+
Show to "reasonable comfort level" that TorchGeo  
 
does not have improperly contributed code, and that the code is all under the Project license.
 
does not have improperly contributed code, and that the code is all under the Project license.
 
  
 
Notice: ''It is not the goal to be able to prove that every source file, and every contribution to those files, was contributed properly. The onus is not on the OSGeo Incubation Committee to prove there are no problems. However, OSGeo Incubation Committee does need to ensure that OSGeo does not officially endorse code with provenance issues that could have identified and corrected with a reasonable effort.''
 
Notice: ''It is not the goal to be able to prove that every source file, and every contribution to those files, was contributed properly. The onus is not on the OSGeo Incubation Committee to prove there are no problems. However, OSGeo Incubation Committee does need to ensure that OSGeo does not officially endorse code with provenance issues that could have identified and corrected with a reasonable effort.''
 
 
  
 
=TorchGeo Provenance Review=
 
=TorchGeo Provenance Review=
Line 26: Line 28:
 
== Project History ==
 
== Project History ==
  
''Summarise the history of the source code, including discussion about code ownership and license over time.''
+
TorchGeo started as an internship project at Microsoft's AI for Good Lab in 2021, with continued support and funding from the University of Illinois Urbana-Champaign and the National Center for Supercomputing Applications. Since then, TorchGeo has been adopted by a diverse community of researchers from academia, industry, and government research institutions who have contributed to the success of the library. In 2025, we founded a new TorchGeo Organization to manage the TorchGeo Project, with the support of members from the above organizations.
  
==Libraries==
+
'''Key milestones''':
  
''List or reference a table of included libraries and/or components. Include:''
+
- '''2021-05-11''': Initial commit (2330aad) by Adam J. Stewart
* ''Library Name''
 
* ''License''
 
* ''Whether the library licence is compatible with the project''
 
* ''Whether the library is compiled into the project's releases, or required as an external dependancy''
 
* ''Whether the library is included in the project's code repository''
 
* ''Issue? (link if issue if one exists)''
 
* ''Ok? (Yes/No/Black - implies not checked yet)''
 
* ''Comment(s)''
 
  
{| class="wikitable" border="1"
+
- '''2021-2024''': Organic growth to 120+ contributors
!  Library !! Licence !! Compatible? !! Included in core? !! Issue(s)? !! Ok? !! Comments
 
|-
 
| Library || Licence || yes || yes|| n/a || yes ||
 
|}
 
  
==Source Files==
+
- '''2025''': Formation of TorchGeo Organization with formal governance structure
  
currently, 2 of 557 python files lack a copyright line
+
- '''2026''': OSGeo incubation application submitted
  
./docs/user/metrics/update_timeline.py  ;  ./tests/data/landcoverai/split.py
+
==Libraries==
 
 
  $ `find . -name "*.py" -exec grep -li "Copyright" {} \;`
 
 
 
 
 
''List of source files, including code, documentation and data, including:''
 
* ''File name''
 
* ''License''
 
* ''Copyright Holder (if known)''
 
* ''License header included?''
 
* ''Issue? (link if issue if one exists)''
 
* Ok''? (Yes/No/Black implies not checked yet)''
 
* ''Comment(s)''
 
  
 +
All dependencies are external (not bundled in source tree). All licenses are OSI-approved and compatible with MIT.
  
 
{| class="wikitable" border="1"
 
{| class="wikitable" border="1"
!  File !! Licence !! Copyright Holder !! Licenced? !! Issue(s)? !! Ok? !! Comments
 
 
|-
 
|-
| File || Licence || <org/person> || yes|| n/a || yes ||  
+
! Library !! License !! Compatible? !! In Core? !! Issues? !! Ok? !! Comments
 +
|-
 +
| einops || MIT || Yes || External || - || Yes || Tensor operations
 +
|-
 +
| geopandas || BSD-3-Clause || Yes || External || - || Yes || Geospatial dataframes
 +
|-
 +
| jsonargparse || MIT || Yes || External || - || Yes || CLI argument parsing
 +
|-
 +
| lightly || MIT || Yes || External || - || Yes || Self-supervised learning
 +
|-
 +
| lightning || Apache-2.0 || Yes || External || - || Yes || Training framework
 +
|-
 +
| matplotlib || PSF || Yes || External || - || Yes || Plotting
 +
|-
 +
| numpy || BSD-3-Clause || Yes || External || - || Yes || Numerical computing
 +
|-
 +
| pandas || BSD-3-Clause || Yes || External || - || Yes || Data manipulation
 +
|-
 +
| pillow || HPND [https://github.com/python-pillow/Pillow/issues/1507 *] /  MIT-CMU[https://github.com/python-pillow/Pillow/pull/8460 *] || Yes || External || - || Yes || Image I/O
 +
|-
 +
| pyproj || MIT || Yes || External || - || Yes || Cartographic projections
 +
|-
 +
| rasterio || BSD-3-Clause || Yes || External || - || Yes || Geospatial raster I/O
 +
|-
 +
| segmentation-models-pytorch || MIT || Yes || External || - || Yes || Segmentation architectures
 +
|-
 +
| shapely || BSD-3-Clause || Yes || External || - || Yes || Geometric operations
 +
|-
 +
| timm || Apache-2.0 || Yes || External || - || Yes || Vision model zoo
 +
|-
 +
| torch || BSD-3-Clause || Yes || External || - || Yes || PyTorch core
 +
|-
 +
| torchmetrics || Apache-2.0 || Yes || External || - || Yes || Metrics computation
 +
|-
 +
| torchvision || BSD-3-Clause || Yes || External || - || Yes || Vision utilities
 +
|-
 +
| *kornia || Apache-2.0 || Yes || External || - || Yes || Computer vision ops
 
|}
 
|}
  
 
==Copyright Holders==
 
==Copyright Holders==
 
''List or reference all copyright holders of the project's source files. These are all the people that would need to be contacted if the license of the project is to be changed. List:''
 
 
* ''Organisation or person''
 
* ''Contact details for Organisation or person''
 
* ''Issue? (link if issue if one exists)''
 
* ''Comments''
 
 
 
{| class="wikitable" border="1"
 
{| class="wikitable" border="1"
!  Copyright Holder !! Contact details !! Issue? !! Comments
 
 
|-
 
|-
| <org/person> || ... || ||
+
! Copyright Holder !! Contact !! Issues !! Comments
 +
|-
 +
| TorchGeo Contributors || N/A (collective) || None || Per LICENSE file
 
|}
 
|}
  
== Committers ==
+
The MIT License file states: "Copyright (c) TorchGeo Contributors."
''List or reference a table of committers, including:''
 
* ''Committer name''
 
* ''Commit username in code repository''
 
* ''Committer's employer(s) (if applicable)''
 
* ''Committer (or employer) has signed contribution agreement? (yes/no)''
 
<pre>
 
TorchGeo Repository Statistics
 
==============================================================
 
commit 6995642a4b03ae03834f599dd58a47a04fc91608
 
Author: Isaac Corley <isaac.corley@proton.me>
 
Date:  Sat Jan 3 16:40:34 2026 -0600
 
  
-
+
This is a collective copyright model - individual contributors retain copyright but license under MIT. Historical other markings include:
commit 2330aad537f09ae93e2fa3283d1c25a3e0230703
 
Author: Adam J. Stewart <ajstewart426@gmail.com>
 
Date:   Tue May 11 14:46:50 2021 -0500
 
  
    Initial commit
+
<pre>
------------------------------------------------
+
> git gr Copyright | grep -v TorchGeo
Total Commits: 3357
+
torchgeo/models/convlstm.py:# Copyright (c) 2017 Andrea Palazzi
 +
torchgeo/models/copernicusfm.py:    Adapted from https://github.com/bwconrad/flexivit. Copyright (c) 2023 Ben Conrad.
 +
torchgeo/models/copernicusfm.py:    Adapted from https://github.com/bwconrad/flexivit. Copyright (c) 2023 Ben Conrad.
 +
torchgeo/models/copernicusfm.py:    Copyright (c) Microsoft Corporation.
 +
torchgeo/models/earthloc.py:    Adapted from https://github.com/gmberton/EarthLoc. Copyright (c) 2024 Gabriele Berton
 +
torchgeo/models/earthloc.py:    Adapted from https://github.com/gmberton/EarthLoc. Copyright (c) 2024 Gabriele Berton
 +
torchgeo/models/earthloc.py:    Adapted from https://github.com/gmberton/EarthLoc. Copyright (c) 2024 Gabriele Berton
 +
torchgeo/models/ltae.py:# Copyright (c) 2020 VSainteuf (Vivien Sainte Fare Garnot)
 +
</pre>
  
Contributors:
+
== Committers ==
  1068  Adam J. Stewart
+
'''Total human contributors:''' 122
  947  dependabot[bot]
 
  217  Caleb Robinson
 
  104  Isaac Corley
 
  101  Nils Lehmann
 
    77  Ashwin Nair
 
    30  isaac
 
    17  Robin Cole
 
    15  Yi Wang
 
    12  Yi-Chia Chang
 
    10  Burak
 
    9  Maciej Kilian
 
    6  Blaž Rolih
 
    5  Amri Rasyidi,Qiusheng Wu,Ritwik Gupta,Sieger Falkena
 
    4  Copilot,Daniele Rege Cambrin,Harald Kristen,gatienc
 
    3  Adrian Tofting,George Huber,Hitesh Tolani,Luca Colomba,Pablo Mandiola,Zhuo Zheng
 
    2  Blanchon,Buğra,Can Michael Hucko,Data Philosopher,Favyen Bastani,Gaétan Bahl,Janne Mäyrä,Jingtong
 
    2  Kaustav Mukherjee,SpontaneousDuck,Wei Ji,Yang Mu,estherrolf,remtav,shreyakannan1205
 
    1  Adrian Boguszewski,Al-Ekram Elahee Hridoy,Aleksey Morozov,Alexandre Lacoste,Alhassan Sahad,Arianna Sole
 
    1  Bianca Zadrozny,Boe Zhou,Byeong-Hyeok Yu,Codruț Diaconu,Daiki Kimura,David Meaux,Dimitris Mantas
 
    1  Dominik K,Dylan Stewart,Ethan White (he/him),Federico Bessi,Forrest Williams,Georgia Manou,Giacomo Caria
 
    1  Heng Fang,HeshamHM28,Ikko Eltociear Ashimine,Inderpreet Singh Parmar,JDilger,Jakub,Kaleb Crans,Kamil Raczycki
 
    1  Keenan Eves,Kendall Smith,Kiran S,Konrad Heidler,Konstantin Klemmer,Krishnagopal Halder
 
    1  Laurențiu Nicola,Leonard Waldmann,Mathias Baumgartinger, Matt Gibson,Mauricio Cordeiro,Mehmet Cem Gunturkun
 
    1  MengluTao,Michael Nolde,Michiaki Tatsubori,Nassim AIT ALI BRAHAM,Night-dev,Odd Eirik Igland,Ohad Nir
 
    1  Oluwabusayo Jacobs,Piero Toffanin,Piper Wolters,Pitchaporn Likitpanjamanon,Reslan Tinawi,Rijul Dahiya,Rémi Cresson
 
    1  Rémi Tavon,Saumya Sinha,Shradha Sehgal,Slawek Bierwiaczonek,Tarandeep Singh,Taylor Denouden,Teo Cherici
 
    1  Veronika Buchauer,XShadow,nands,iejMac,kaybe20, lhackel-tub,lns-lns,lucastao,mdchuc,nsutezo,ntw-au,tritolol
 
 
 
File Changes Summary:
 
Added lines: 215314
 
Removed lines: 101198
 
Total lines changed: 114116
 
 
 
Primarily in Python : 593 dot-py files in 1283 directories
 
  
Most Changed Files:
+
'''Total commits:''' 2778 (excluding bots)
  2769
 
    435 requirements/required.txt
 
    206 .github/workflows/tests.yaml
 
    179 requirements/style.txt
 
    166 torchgeo/datasets/__init__.py
 
    156 setup.cfg
 
    156 pyproject.toml
 
    140 requirements/datasets.txt
 
    137 docs/api/datasets.rst
 
    136 torchgeo/datasets/geo.py
 
  
</pre>
+
'''Top contributors''' (PSC members marked with *):
  
 
{| class="wikitable" border="1"
 
{| class="wikitable" border="1"
!  Committer !! Username !! Affiliation !! Contribution Agrement Signed? !! Ok?
 
 
|-
 
|-
| Name || Username || Company || Yes || Yes
+
! Committer !! Email !! Affiliation
 +
|-
 +
| Adam J. Stewart * || ajstewart426@gmail.com || TUM / UIUC
 +
|-
 +
| Caleb Robinson * || calebrob6@gmail.com || AI for Good Msft
 +
|-
 +
| Isaac Corley * || isaac.corley@proton.me || Wherobots
 +
|-
 +
| Nils Lehmann * || nils.lehmann24@gmail.com || TUM
 +
|-
 +
| Ashwin Nair * || ashnair0007@gmail.com || Space42
 +
|-
 +
| Anthony Ortiz * || Anthony.Ortiz@microsoft.com || Microsoft
 +
|-
 +
| Robin Cole || robmarkcole@gmail.com || EarthDaily
 +
|-
 +
| Yi Wang || wangyi111@users.noreply.github.com || TUM
 +
|-
 +
| Yi-Chia Chang || yichiac@users.noreply.github.com || UIUC
 +
|-
 +
| Burak Ekim || burakekim@users.noreply.github.com || Bundeswehr Munich
 +
|-
 +
| Blaz Rolih || blaz-r@users.noreply.github.com || ULjubljana, Slovenia
 +
|-
 +
| Maciej Kilian || iejMac@users.noreply.github.com || Stability AI
 +
|-
 +
| Qiusheng Wu || giswqs@gmail.com || UTenn Knoxville
 
|}
 
|}
  
==Further Issues ==
+
'''Note:''' 122 total contributors. Full list available via <code>git shortlog -sne --all</code>.
''List or reference a list of any outstanding issues not already listed above. Typically link to issues in project's issue tracker.''
 
  
 +
==Source Files==
 +
'''Total Python files:''' 557
  
 +
'''Files with copyright header:''' 557 (100%) - ''after PR #3262 merged''
  
 +
'''Standard header format:'''
 +
# Copyright (c) TorchGeo Contributors. All rights reserved.
 +
# Licensed under the MIT License.
  
==
+
{| class="wikitable" border="1"
==Process==
+
|-
 +
! File !! License !! Copyright Holder !! Header? !! Issues? !! Ok? !! Comments
 +
|-
 +
| docs/user/metrics/update_timeline.py || MIT || TorchGeo Contributors || Yes || None || Yes || Fixed in PR #3262
 +
|-
 +
| tests/data/landcoverai/split.py || MIT || TorchGeo Contributors || Yes || None || Yes || Fixed in PR #3262
 +
|}
  
The Code Provenance Review is separated into two sections, each with a slightly different process:
+
'''Note:''' PR https://github.com/torchgeo/torchgeo/pull/3262 adds copyright headers to the 2 files that were missing them.
# Create a Provenance Review Document by copying the [[Provenance Review Template]].
 
# Complete the Library/Component Review Process.
 
# Complete the Code Copyright Review.
 
# Submit the completed Provenance Review Document to the Project Mentor for review.
 
  
==Library/Component Review Process==
+
==Further Issues ==
 +
'''Resolved:'''
 +
* Copyright headers added to 2 missing files via PR #3262: https://github.com/torchgeo/torchgeo/pull/3262
  
For the overall project, document potential Library/Component issues by:
+
* OSGeo does not require a CLA. We ask projects to think about what they want during incubation.
# Listing any external components that are included "in the source tree" for the project. For instance, GDAL includes a copy of libtiff, libjpeg, etc in the source tree. It is necessary to be able to identify things like that in the source tree that are under their own distinct license, and are not explicitly vetted by the project team. If practical, it is desirable to remove these internal components and treat them as external dependencies. If kept internal, notes on the licenses of the components should be documented using the Provenance Review Template.
 
# Listing all external dependencies with:
 
## Potentially problematic license terms (all non-free libraries, for instance).
 
## All libraries with licenses that might conflict (for instance GDAL's use of GPL'ed GRASS libraries in the non-GPL GDAL).
 
  
==Code Copyright Review==
+
* Contributor agreement confirmed
  
For each project source file, perform the following:
+
'''Pending:'''
# Does the file include the license information? If not, add it if there there is no ambiguity about whether the standard project license applies. If that is not obvious, make notes in the review document.
 
# Is the file under the normal project license? If not, make notes in the review document.
 
# Is there anything obviously unusual about the origin of the code? Does this pose any conflicts? Is the issue properly described in the source file? For instance, in GDAL, the gdal/port/cpl_strtod.cpp file is closely derived from external code that was placed in the public domain. cpl_strtod.cpp is placed under the normal GDAL MIT/X license, but detailed notes are kept in the header text on it's origin, the fact that this was public domain and so the fact that it is ok to relicense it. Oddities should be noted in the source file itself and in the review document.
 
# Maintain a list of all copyright holders identified in the Provenance Review Document. This list is essentially everyone who would need to agree to relicense the project. It may be desirable to seek copyright assignment to a "project lead", or to the foundation to reduce the number of copyright holders for the project, although this is not required.
 
  
==Frequently Asked Questions==
+
== TorchGeo Application Questionnaire ==
 +
See the main application questionnaire for additional project details:
 +
https://trac.osgeo.org/osgeo/wiki/TorchGeo
  
;Do we need to ensure there is a copyright and licence headers in every file?
+
==Process==
:Projects are expected to embed a copyright and license header in every source file (.cpp and .h for instance). Machine generated files (such as lex or yacc output for instance) does not require a header if it is inconvenient to embed one, but then an effort should be made to embed a header in the original source file (ie. .l or .y). There is no need to embed copyright and license headers in build files (ie. Makefiles, Visual Studio project files), nor in small documentation fragments (like READMEs). Copyright/license headers should be included in substantial bodies of documentation if it can be done in such a way that it doesn't interfere with the document.
 
  
;Do I really need to embed the whole license agreement in every file?
+
'''This Code Provenance Review followed the OSGeo Incubation process:'''
:No, it is sufficient to refer to a single copy of the license agreement for the project, but each file should include an indication of what the license is, and the location of the full license document. For example:
+
{| class="wikitable"
  /*
+
|-
  * Copyright (C) 2004-2005  Autodesk, Inc.
+
! Step !! Status !! Date !! Notes
  *
+
|-
  * This library is free software; you can redistribute it and/or
+
| 1. Create Provenance Review Document || Complete || 06 Jan 2026 || Document created from [https://wiki.osgeo.org/wiki/Provenance_Review_Template Provenance Review Template]
  * modify it under the terms of version 2.1 of the GNU Lesser
+
|-
  * General Public License as published by the Free Software Foundation.
+
| 2. Library/Component Review || Complete || 08 Jan 2026 || 18 dependencies reviewed, no issues found
  * 
+
|-
  * This library is distributed in the hope that it will be useful,
+
| 3. Code Copyright Review || Complete || 08 Jan 2026 || 557 files reviewed, no issues found
  * but WITHOUT ANY WARRANTY; without even the implied warranty of
+
|-
  * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+
| 4. Submit to Project Mentor || Complete || 08 Jan 2026 || Brian M Hamlin (Mentor)
  * Lesser General Public License for more details.
+
|}
  *
+
=== Review Summary ===
  * You should have received a copy of the GNU Lesser General Public
+
{| class="wikitable"
  * License along with this library; if not, write to the Free Software
+
|-
  * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
| '''Components in source tree:''' || None (all external dependencies)
  */
+
|-
 +
| '''Problematic licenses:''' || None identified
 +
|-
 +
| '''License conflicts:''' || None identified
 +
|-
 +
| '''Files reviewed:''' || 557 Python files
 +
|-
 +
| '''Files with headers:''' || 557 (100%)
 +
|-
 +
| '''External code derivations:''' || 2 files (both MIT licensed, properly attributed)
 +
|-
 +
| '''Copyright model:''' || Collective ("TorchGeo Contributors")
 +
|}
 +
=== Outstanding Items ===
 +
* '''Pending merge:''' [https://github.com/torchgeo/torchgeo/pull/3262 PR #3262] - adds copyright headers to 2 files
 +
* '''Resolved:''' CLA/DCO status confirmation. OSGeo does not require a CLA. OSGeo asks projects to think about what they want during incubation.
 +
''ajstewart'' -- We used to have a CLA when we were under MSFT. We no longer have any CLA or DCO. We currently have no plans to add one. I know the Linux Foundation requires DCO. Does OSGeo have any requirements?
 +
* '''Resolved:''' a PSC Members Agreement is not required by OSGeo
  
or:
+
=== Conclusion ===
 +
'''The TorchGeo codebase passes the Code Provenance Review''' with no blocking issues identified. The project demonstrates good license hygiene and is ready for OSGeo incubation from a code provenance perspective.
  
  * COPYRIGHT:    (C) 2001 by the GRASS Development Team
+
==Library/Component Review Process==
  *
 
  *              This program is free software under the GNU General Public
 
  *              License (>=v2). Read the file COPYING that comes with GRASS
 
  *              for details.
 
  
;Can I just write a script to check everything?
+
'''Review completed:''' 08 Jan 2026
:No. The point of a human review is to ensure that human eyes have looked at the header of each file, looking for any anomalies. For instance it is not uncommon to have additional messages written in headers about the original source of particular code, but not necessarily in a way that a script could recognise.
+
=== 1. Components in Source Tree ===
 +
'''Finding:''' '''None'''
 +
TorchGeo does '''not''' bundle any third-party libraries in its source tree. Unlike projects like GDAL (which includes copies of libtiff, libjpeg, etc.), TorchGeo treats all dependencies as external packages installed via pip/conda.
 +
* No <code>vendor/</code> directory
 +
* No <code>third_party/</code> directory 
 +
* No <code>external/</code> directory
 +
* No embedded library copies
 +
This is the recommended approach and simplifies license management.
 +
=== 2. External Dependencies ===
 +
'''Total required dependencies:''' 18
 +
'''All licenses are OSI-approved and compatible with MIT.'''
 +
{| class="wikitable"
 +
|-
 +
! Package !! License !! Compatible? !! Potentially Problematic?
 +
|-
 +
| einops || MIT || Yes || No
 +
|-
 +
| geopandas || BSD-3-Clause || Yes || No
 +
|-
 +
| jsonargparse || MIT || Yes || No
 +
|-
 +
| kornia || Apache-2.0 || Yes || No
 +
|-
 +
| lightly || MIT || Yes || No
 +
|-
 +
| lightning || Apache-2.0 || Yes || No
 +
|-
 +
| matplotlib || PSF (BSD-style) || Yes || No
 +
|-
 +
| numpy || BSD-3-Clause || Yes || No
 +
|-
 +
| pandas || BSD-3-Clause || Yes || No
 +
|-
 +
| pillow || HPND / MIT-CMU[https://github.com/python-pillow/Pillow/pull/8460 *] || Yes || No
 +
|-
 +
| pyproj || MIT || Yes || No
 +
|-
 +
| rasterio || BSD-3-Clause || Yes || No
 +
|-
 +
| segmentation-models-pytorch || MIT || Yes || No
 +
|-
 +
| shapely || BSD-3-Clause || Yes || No
 +
|-
 +
| timm || Apache-2.0 || Yes || No
 +
|-
 +
| torch || BSD-3-Clause || Yes || No
 +
|-
 +
| torchmetrics || Apache-2.0 || Yes || No
 +
|-
 +
| torchvision || BSD-3-Clause || Yes || No
 +
|}
  
; But I just did a review similar to this a few months ago, do I really need to do it again?
+
=== 3. License Distribution ===
: If you believe you have already met or exceeded the rigor described in this review document, then you can present the results of your previous analysis along with an explanation of why you think it still applies. You will need to manually review additions since that earlier review. It will be up to the Incubator (and ultimately the foundation board) to determine if they are satisfied with any review.
+
{| class="wikitable"
 +
|-
 +
! License Type !! Count !! Notes
 +
|-
 +
| MIT || 6 || Permissive, fully compatible
 +
|-
 +
| BSD-3-Clause || 7 || Permissive, fully compatible
 +
|-
 +
| Apache-2.0 || 4 || Permissive, fully compatible
 +
|-
 +
| PSF || 1 || Python Software Foundation, BSD-style
 +
|-
 +
| HPND / MIT-CMU[https://github.com/python-pillow/Pillow/pull/8460 *] || 1 || Historical Permission Notice and Disclaimer, permissive
 +
|}
  
;Do I need to track down every contributor and ask for permission to use their code?
+
=== 4. Potentially Problematic Licenses ===
:No. As long as code was knowingly provided by contributors to the project while it was under a clear licensing policy then it is reasonable to assume the project licensing arrangements are accepted unless there is some reason to have doubts. If code was brought into the project from outside sources under ambiguous circumstances then it may be necessary to contact the copyright holder for a legal release, or at least a statement of permission.
+
'''Finding:''' '''None identified'''
 +
* No GPL or LGPL dependencies (no copyleft)
 +
* No AGPL dependencies
 +
* No proprietary or non-free dependencies
 +
* No "viral" license concerns
 +
=== 5. License Conflicts ===
 +
'''Finding:''' '''None identified'''
 +
All dependency licenses are permissive and compatible with each other and with TorchGeo's MIT license. There are no conflicts analogous to GDAL's use of GPL'ed GRASS libraries.
 +
=== 6. Optional Dependencies ===
 +
TorchGeo has optional dependencies for the <code>datasets</code> extra. All use permissive licenses:
 +
{| class="wikitable"
 +
|-
 +
! Package !! License
 +
|-
 +
| h5py || BSD-3-Clause
 +
|-
 +
| laspy || BSD-2-Clause
 +
|-
 +
| netcdf4 || MIT
 +
|-
 +
| opencv-python || Apache-2.0
 +
|-
 +
| pycocotools || BSD-2-Clause
 +
|-
 +
| rioxarray || Apache-2.0
 +
|-
 +
| scikit-image || BSD-3-Clause
 +
|-
 +
| scipy || BSD-3-Clause
 +
|-
 +
| xarray || Apache-2.0
 +
|}
 +
=== 7. Conclusion ===
 +
'''Overall assessment:''' No library/component issues identified
 +
* No bundled code in source tree
 +
* All 18 required dependencies use permissive OSI-approved licenses
 +
* No GPL/copyleft dependencies
 +
* No license conflicts
 +
* Clean dependency model suitable for OSGeo incubation
  
;Some project code was adapted from other sources with ambiguous copyrights, what do I do?
+
== Code Copyright Review ==
:First, document such concerns in the review document and within the source files themselves. Second, try to obtain a release to use the provide code. An email statement of permission should be sufficient, but pertinant details of such an email should be included in the source file in questions (eg. date, identification of releasing person, text of the release). If no release is possible, but the code isn't very necessary, consider removing or replacing it to simplify things. Otherwise the issue will need to be discussed in the incubator to assess risk.
+
'''Review completed:''' 08 Jan 2026
 
+
'''Reviewer:''' Brian M Hamlin (Mentor), with assistance from automated tooling
; Should I be looking for patent problems?
+
'''Scope:''' All 557 Python source files in the TorchGeo repository
: No. However, if the code review turns up notes about patent issues - or if there are known patent issues for the project, then these should be noted in the review document. Generally speaking a proper patent review is very difficult and may increase, rather than decrease, legal liability.
+
=== 1. License Headers ===
 +
{| class="wikitable"
 +
|-
 +
| '''Total Python files:''' || 557
 +
|-
 +
| '''Files with standard header:''' || 557 (100%)
 +
|-
 +
| '''Files missing header:''' || 0 (after [https://github.com/torchgeo/torchgeo/pull/3262 PR #3262])
 +
|}
 +
'''Standard header format:'''
 +
# Copyright (c) TorchGeo Contributors. All rights reserved.
 +
# Licensed under the MIT License.
 +
'''Finding:''' All source files include the standard license header. Two utility scripts (<code>docs/user/metrics/update_timeline.py</code> and <code>tests/data/landcoverai/split.py</code>) were identified as missing headers during this review and fixed via [https://github.com/torchgeo/torchgeo/pull/3262 PR #3262].
 +
=== 2. License Consistency ===
 +
'''Finding:''' All files are under the '''MIT License'''. No files use a different or conflicting license.
 +
* No GPL, LGPL, or copyleft licenses found
 +
* No Apache-only or BSD-only files found
 +
* All code consistently uses MIT
 +
=== 3. Code Origins ===
 +
Two files contain code derived from external sources. Both are properly attributed and both original sources use MIT License (compatible):
 +
{| class="wikitable"
 +
|-
 +
! File !! Source !! Source License !! Compatible? !! Attribution in File?
 +
|-
 +
| <code>torchgeo/models/btc.py</code> || [https://github.com/blaz-r/BTC-change-detection blaz-r/BTC-change-detection] || MIT || Yes || Yes
 +
|-
 +
| <code>torchgeo/models/croma.py</code> || [https://github.com/antofuller/CROMA antofuller/CROMA] || MIT || Yes || Yes
 +
|}
 +
'''Finding:''' Both external code sources are MIT licensed, which is fully compatible with TorchGeo's MIT license. Attribution is properly documented in the source file headers.
 +
=== 4. Vendored/Bundled Code ===
 +
'''Finding:''' No vendored or bundled third-party code exists in the source tree. All dependencies are external and installed via pip/conda.
 +
* No <code>vendor/</code>, <code>third_party/</code>, or <code>external/</code> directories
 +
* No embedded copies of libraries (unlike GDAL which bundles libtiff, libjpeg, etc.)
 +
* All 18 dependencies are fetched externally at install time
 +
=== 5. Copyright Holders ===
 +
{| class="wikitable"
 +
|-
 +
! Copyright Holder !! Type !! Notes
 +
|-
 +
| TorchGeo Contributors || Collective || Single copyright holder for all files
 +
|}
 +
'''Finding:''' The project uses a '''collective copyright model'''. All files use "Copyright (c) TorchGeo Contributors" rather than individual contributor names.
 +
'''Implications for relicensing:'''
 +
* Simplified model - no need to contact 122 individual contributors
 +
* Contributors implicitly agree to collective copyright by submitting PRs
 +
* '''Resolved:''' Contributor License Agreement (CLA) or Developer Certificate of Origin (DCO) satisfactory and/or not required for incubation.
  
; Should I be looking for trademark conflicts?
+
=== 6. Outstanding Issues ===
: Possibly. It may be prudent to check for trademark conflicts for the project name, or the name of public packages within the project. If issues are found they should be noted in the review document.
+
'''Resolved:'''
 +
* Copyright headers added to 2 missing files: [https://github.com/torchgeo/torchgeo/pull/3262 PR #3262]
 +
* Confirm whether a CLA/DCO is in place or planned
 +
* Contributor agreement status for PSC members
  
 +
'''Pending:'''
  
[[Category: Incubation]]
+
=== 7. Conclusion ===
 +
'''Overall assessment:''' No provenance issues identified.
 +
The TorchGeo codebase demonstrates good copyright hygiene:
 +
* 100% of source files have proper license headers
 +
* All code is consistently MIT licensed
 +
* External code derivations are properly attributed and license-compatible
 +
* No vendored code complicates the license picture
 +
* Collective copyright model simplifies governance
 +
The codebase is ready for OSGeo incubation from a code provenance perspective, pending confirmation of contributor agreement procedures from the PSC.

Latest revision as of 20:02, 20 January 2026

Code Provenance Review Process

TorchGeo project refer : https://www.osgeo.org/about/committees/incubation/graduation/

Incubation application questionnaire : https://trac.osgeo.org/osgeo/wiki/TorchGeo

Document Status

Version: 0.5
Status: Updated
Last Edited: 15 Jan 2026

Purpose

Show to "reasonable comfort level" that TorchGeo does not have improperly contributed code, and that the code is all under the Project license.

Notice: It is not the goal to be able to prove that every source file, and every contribution to those files, was contributed properly. The onus is not on the OSGeo Incubation Committee to prove there are no problems. However, OSGeo Incubation Committee does need to ensure that OSGeo does not officially endorse code with provenance issues that could have identified and corrected with a reasonable effort.

TorchGeo Provenance Review

Project History

TorchGeo started as an internship project at Microsoft's AI for Good Lab in 2021, with continued support and funding from the University of Illinois Urbana-Champaign and the National Center for Supercomputing Applications. Since then, TorchGeo has been adopted by a diverse community of researchers from academia, industry, and government research institutions who have contributed to the success of the library. In 2025, we founded a new TorchGeo Organization to manage the TorchGeo Project, with the support of members from the above organizations.

Key milestones:

- 2021-05-11: Initial commit (2330aad) by Adam J. Stewart

- 2021-2024: Organic growth to 120+ contributors

- 2025: Formation of TorchGeo Organization with formal governance structure

- 2026: OSGeo incubation application submitted

Libraries

All dependencies are external (not bundled in source tree). All licenses are OSI-approved and compatible with MIT.

Library License Compatible? In Core? Issues? Ok? Comments
einops MIT Yes External - Yes Tensor operations
geopandas BSD-3-Clause Yes External - Yes Geospatial dataframes
jsonargparse MIT Yes External - Yes CLI argument parsing
lightly MIT Yes External - Yes Self-supervised learning
lightning Apache-2.0 Yes External - Yes Training framework
matplotlib PSF Yes External - Yes Plotting
numpy BSD-3-Clause Yes External - Yes Numerical computing
pandas BSD-3-Clause Yes External - Yes Data manipulation
pillow HPND * / MIT-CMU* Yes External - Yes Image I/O
pyproj MIT Yes External - Yes Cartographic projections
rasterio BSD-3-Clause Yes External - Yes Geospatial raster I/O
segmentation-models-pytorch MIT Yes External - Yes Segmentation architectures
shapely BSD-3-Clause Yes External - Yes Geometric operations
timm Apache-2.0 Yes External - Yes Vision model zoo
torch BSD-3-Clause Yes External - Yes PyTorch core
torchmetrics Apache-2.0 Yes External - Yes Metrics computation
torchvision BSD-3-Clause Yes External - Yes Vision utilities
*kornia Apache-2.0 Yes External - Yes Computer vision ops

Copyright Holders

Copyright Holder Contact Issues Comments
TorchGeo Contributors N/A (collective) None Per LICENSE file

The MIT License file states: "Copyright (c) TorchGeo Contributors."

This is a collective copyright model - individual contributors retain copyright but license under MIT. Historical other markings include:

> git gr Copyright | grep -v TorchGeo
torchgeo/models/convlstm.py:# Copyright (c) 2017 Andrea Palazzi
torchgeo/models/copernicusfm.py:    Adapted from https://github.com/bwconrad/flexivit. Copyright (c) 2023 Ben Conrad.
torchgeo/models/copernicusfm.py:    Adapted from https://github.com/bwconrad/flexivit. Copyright (c) 2023 Ben Conrad.
torchgeo/models/copernicusfm.py:    Copyright (c) Microsoft Corporation.
torchgeo/models/earthloc.py:    Adapted from https://github.com/gmberton/EarthLoc. Copyright (c) 2024 Gabriele Berton
torchgeo/models/earthloc.py:    Adapted from https://github.com/gmberton/EarthLoc. Copyright (c) 2024 Gabriele Berton
torchgeo/models/earthloc.py:    Adapted from https://github.com/gmberton/EarthLoc. Copyright (c) 2024 Gabriele Berton
torchgeo/models/ltae.py:# Copyright (c) 2020 VSainteuf (Vivien Sainte Fare Garnot)

Committers

Total human contributors: 122

Total commits: 2778 (excluding bots)

Top contributors (PSC members marked with *):

Committer Email Affiliation
Adam J. Stewart * ajstewart426@gmail.com TUM / UIUC
Caleb Robinson * calebrob6@gmail.com AI for Good Msft
Isaac Corley * isaac.corley@proton.me Wherobots
Nils Lehmann * nils.lehmann24@gmail.com TUM
Ashwin Nair * ashnair0007@gmail.com Space42
Anthony Ortiz * Anthony.Ortiz@microsoft.com Microsoft
Robin Cole robmarkcole@gmail.com EarthDaily
Yi Wang wangyi111@users.noreply.github.com TUM
Yi-Chia Chang yichiac@users.noreply.github.com UIUC
Burak Ekim burakekim@users.noreply.github.com Bundeswehr Munich
Blaz Rolih blaz-r@users.noreply.github.com ULjubljana, Slovenia
Maciej Kilian iejMac@users.noreply.github.com Stability AI
Qiusheng Wu giswqs@gmail.com UTenn Knoxville

Note: 122 total contributors. Full list available via git shortlog -sne --all.

Source Files

Total Python files: 557

Files with copyright header: 557 (100%) - after PR #3262 merged

Standard header format:

# Copyright (c) TorchGeo Contributors. All rights reserved.
# Licensed under the MIT License.
File License Copyright Holder Header? Issues? Ok? Comments
docs/user/metrics/update_timeline.py MIT TorchGeo Contributors Yes None Yes Fixed in PR #3262
tests/data/landcoverai/split.py MIT TorchGeo Contributors Yes None Yes Fixed in PR #3262

Note: PR https://github.com/torchgeo/torchgeo/pull/3262 adds copyright headers to the 2 files that were missing them.

Further Issues

Resolved:

  • OSGeo does not require a CLA. We ask projects to think about what they want during incubation.
  • Contributor agreement confirmed

Pending:

TorchGeo Application Questionnaire

See the main application questionnaire for additional project details: https://trac.osgeo.org/osgeo/wiki/TorchGeo

Process

This Code Provenance Review followed the OSGeo Incubation process:

Step Status Date Notes
1. Create Provenance Review Document Complete 06 Jan 2026 Document created from Provenance Review Template
2. Library/Component Review Complete 08 Jan 2026 18 dependencies reviewed, no issues found
3. Code Copyright Review Complete 08 Jan 2026 557 files reviewed, no issues found
4. Submit to Project Mentor Complete 08 Jan 2026 Brian M Hamlin (Mentor)

Review Summary

Components in source tree: None (all external dependencies)
Problematic licenses: None identified
License conflicts: None identified
Files reviewed: 557 Python files
Files with headers: 557 (100%)
External code derivations: 2 files (both MIT licensed, properly attributed)
Copyright model: Collective ("TorchGeo Contributors")

Outstanding Items

  • Pending merge: PR #3262 - adds copyright headers to 2 files
  • Resolved: CLA/DCO status confirmation. OSGeo does not require a CLA. OSGeo asks projects to think about what they want during incubation.

ajstewart -- We used to have a CLA when we were under MSFT. We no longer have any CLA or DCO. We currently have no plans to add one. I know the Linux Foundation requires DCO. Does OSGeo have any requirements?

  • Resolved: a PSC Members Agreement is not required by OSGeo

Conclusion

The TorchGeo codebase passes the Code Provenance Review with no blocking issues identified. The project demonstrates good license hygiene and is ready for OSGeo incubation from a code provenance perspective.

Library/Component Review Process

Review completed: 08 Jan 2026

1. Components in Source Tree

Finding: None TorchGeo does not bundle any third-party libraries in its source tree. Unlike projects like GDAL (which includes copies of libtiff, libjpeg, etc.), TorchGeo treats all dependencies as external packages installed via pip/conda.

  • No vendor/ directory
  • No third_party/ directory
  • No external/ directory
  • No embedded library copies

This is the recommended approach and simplifies license management.

2. External Dependencies

Total required dependencies: 18 All licenses are OSI-approved and compatible with MIT.

Package License Compatible? Potentially Problematic?
einops MIT Yes No
geopandas BSD-3-Clause Yes No
jsonargparse MIT Yes No
kornia Apache-2.0 Yes No
lightly MIT Yes No
lightning Apache-2.0 Yes No
matplotlib PSF (BSD-style) Yes No
numpy BSD-3-Clause Yes No
pandas BSD-3-Clause Yes No
pillow HPND / MIT-CMU* Yes No
pyproj MIT Yes No
rasterio BSD-3-Clause Yes No
segmentation-models-pytorch MIT Yes No
shapely BSD-3-Clause Yes No
timm Apache-2.0 Yes No
torch BSD-3-Clause Yes No
torchmetrics Apache-2.0 Yes No
torchvision BSD-3-Clause Yes No

3. License Distribution

License Type Count Notes
MIT 6 Permissive, fully compatible
BSD-3-Clause 7 Permissive, fully compatible
Apache-2.0 4 Permissive, fully compatible
PSF 1 Python Software Foundation, BSD-style
HPND / MIT-CMU* 1 Historical Permission Notice and Disclaimer, permissive

4. Potentially Problematic Licenses

Finding: None identified

  • No GPL or LGPL dependencies (no copyleft)
  • No AGPL dependencies
  • No proprietary or non-free dependencies
  • No "viral" license concerns

5. License Conflicts

Finding: None identified All dependency licenses are permissive and compatible with each other and with TorchGeo's MIT license. There are no conflicts analogous to GDAL's use of GPL'ed GRASS libraries.

6. Optional Dependencies

TorchGeo has optional dependencies for the datasets extra. All use permissive licenses:

Package License
h5py BSD-3-Clause
laspy BSD-2-Clause
netcdf4 MIT
opencv-python Apache-2.0
pycocotools BSD-2-Clause
rioxarray Apache-2.0
scikit-image BSD-3-Clause
scipy BSD-3-Clause
xarray Apache-2.0

7. Conclusion

Overall assessment: No library/component issues identified

  • No bundled code in source tree
  • All 18 required dependencies use permissive OSI-approved licenses
  • No GPL/copyleft dependencies
  • No license conflicts
  • Clean dependency model suitable for OSGeo incubation

Code Copyright Review

Review completed: 08 Jan 2026 Reviewer: Brian M Hamlin (Mentor), with assistance from automated tooling Scope: All 557 Python source files in the TorchGeo repository

1. License Headers

Total Python files: 557
Files with standard header: 557 (100%)
Files missing header: 0 (after PR #3262)

Standard header format:

# Copyright (c) TorchGeo Contributors. All rights reserved.
# Licensed under the MIT License.

Finding: All source files include the standard license header. Two utility scripts (docs/user/metrics/update_timeline.py and tests/data/landcoverai/split.py) were identified as missing headers during this review and fixed via PR #3262.

2. License Consistency

Finding: All files are under the MIT License. No files use a different or conflicting license.

  • No GPL, LGPL, or copyleft licenses found
  • No Apache-only or BSD-only files found
  • All code consistently uses MIT

3. Code Origins

Two files contain code derived from external sources. Both are properly attributed and both original sources use MIT License (compatible):

File Source Source License Compatible? Attribution in File?
torchgeo/models/btc.py blaz-r/BTC-change-detection MIT Yes Yes
torchgeo/models/croma.py antofuller/CROMA MIT Yes Yes

Finding: Both external code sources are MIT licensed, which is fully compatible with TorchGeo's MIT license. Attribution is properly documented in the source file headers.

4. Vendored/Bundled Code

Finding: No vendored or bundled third-party code exists in the source tree. All dependencies are external and installed via pip/conda.

  • No vendor/, third_party/, or external/ directories
  • No embedded copies of libraries (unlike GDAL which bundles libtiff, libjpeg, etc.)
  • All 18 dependencies are fetched externally at install time

5. Copyright Holders

Copyright Holder Type Notes
TorchGeo Contributors Collective Single copyright holder for all files

Finding: The project uses a collective copyright model. All files use "Copyright (c) TorchGeo Contributors" rather than individual contributor names. Implications for relicensing:

  • Simplified model - no need to contact 122 individual contributors
  • Contributors implicitly agree to collective copyright by submitting PRs
  • Resolved: Contributor License Agreement (CLA) or Developer Certificate of Origin (DCO) satisfactory and/or not required for incubation.

6. Outstanding Issues

Resolved:

  • Copyright headers added to 2 missing files: PR #3262
  • Confirm whether a CLA/DCO is in place or planned
  • Contributor agreement status for PSC members

Pending:

7. Conclusion

Overall assessment: No provenance issues identified. The TorchGeo codebase demonstrates good copyright hygiene:

  • 100% of source files have proper license headers
  • All code is consistently MIT licensed
  • External code derivations are properly attributed and license-compatible
  • No vendored code complicates the license picture
  • Collective copyright model simplifies governance

The codebase is ready for OSGeo incubation from a code provenance perspective, pending confirmation of contributor agreement procedures from the PSC.