Difference between revisions of "TorchGeo Code Provenance"

From OSGeo
Jump to navigation Jump to search
(add repo stats)
Line 85: Line 85:
 
* ''Committer's employer(s) (if applicable)''
 
* ''Committer's employer(s) (if applicable)''
 
* ''Committer (or employer) has signed contribution agreement? (yes/no)''
 
* ''Committer (or employer) has signed contribution agreement? (yes/no)''
 +
 +
TorchGeo Repository Statistics
 +
==============================================================
 +
commit 6995642a4b03ae03834f599dd58a47a04fc91608
 +
Author: Isaac Corley <isaac.corley@proton.me>
 +
Date:  Sat Jan 3 16:40:34 2026 -0600
 +
 +
-
 +
commit 2330aad537f09ae93e2fa3283d1c25a3e0230703
 +
Author: Adam J. Stewart <ajstewart426@gmail.com>
 +
Date:  Tue May 11 14:46:50 2021 -0500
 +
 +
    Initial commit
 +
------------------------------------------------
 +
Total Commits: 3357
 +
 +
Contributors:
 +
  1068  Adam J. Stewart
 +
  947  dependabot[bot]
 +
  217  Caleb Robinson
 +
  104  Isaac Corley
 +
  101  Nils Lehmann
 +
    77  Ashwin Nair
 +
    30  isaac
 +
    17  Robin Cole
 +
    15  Yi Wang
 +
    12  Yi-Chia Chang
 +
    10  Burak
 +
    9  Maciej Kilian
 +
    6  Blaž Rolih
 +
    5  Amri Rasyidi,Qiusheng Wu,Ritwik Gupta,Sieger Falkena
 +
    4  Copilot,Daniele Rege Cambrin,Harald Kristen,gatienc
 +
    3  Adrian Tofting,George Huber,Hitesh Tolani,Luca Colomba,Pablo Mandiola,Zhuo Zheng
 +
    2  Blanchon,Buğra,Can Michael Hucko,Data Philosopher,Favyen Bastani,Gaétan Bahl,Janne Mäyrä,Jingtong
 +
    2  Kaustav Mukherjee,SpontaneousDuck,Wei Ji,Yang Mu,estherrolf,remtav,shreyakannan1205
 +
    1  Adrian Boguszewski,Al-Ekram Elahee Hridoy,Aleksey Morozov,Alexandre Lacoste,Alhassan Sahad,Arianna Sole
 +
    1  Bianca Zadrozny,Boe Zhou,Byeong-Hyeok Yu,Codruț Diaconu,Daiki Kimura,David Meaux,Dimitris Mantas
 +
    1  Dominik K,Dylan Stewart,Ethan White (he/him),Federico Bessi,Forrest Williams,Georgia Manou,Giacomo Caria
 +
    1  Heng Fang,HeshamHM28,Ikko Eltociear Ashimine,Inderpreet Singh Parmar,JDilger,Jakub,Kaleb Crans,Kamil Raczycki
 +
    1  Keenan Eves,Kendall Smith,Kiran S,Konrad Heidler,Konstantin Klemmer,Krishnagopal Halder
 +
    1  Laurențiu Nicola,Leonard Waldmann,Mathias Baumgartinger, Matt Gibson,Mauricio Cordeiro,Mehmet Cem Gunturkun
 +
    1  MengluTao,Michael Nolde,Michiaki Tatsubori,Nassim AIT ALI BRAHAM,Night-dev,Odd Eirik Igland,Ohad Nir
 +
    1  Oluwabusayo Jacobs,Piero Toffanin,Piper Wolters,Pitchaporn Likitpanjamanon,Reslan Tinawi,Rijul Dahiya,Rémi Cresson
 +
    1  Rémi Tavon,Saumya Sinha,Shradha Sehgal,Slawek Bierwiaczonek,Tarandeep Singh,Taylor Denouden,Teo Cherici
 +
    1  Veronika Buchauer,XShadow,nands,iejMac,kaybe20, lhackel-tub,lns-lns,lucastao,mdchuc,nsutezo,ntw-au,tritolol
 +
 +
File Changes Summary:
 +
Added lines: 215314
 +
Removed lines: 101198
 +
Total lines changed: 114116
 +
 +
Primarily in Python : 593 dot-py files in 1283 directories
 +
 +
Most Changed Files:
 +
  2769
 +
    435 requirements/required.txt
 +
    206 .github/workflows/tests.yaml
 +
    179 requirements/style.txt
 +
    166 torchgeo/datasets/__init__.py
 +
    156 setup.cfg
 +
    156 pyproject.toml
 +
    140 requirements/datasets.txt
 +
    137 docs/api/datasets.rst
 +
    136 torchgeo/datasets/geo.py
  
  

Revision as of 17:12, 6 January 2026

Code Provenance Review Process

TorchGeo project refer : https://www.osgeo.org/about/committees/incubation/graduation/

Document Status

Version: 0.1

Status: init

Last Edited: 06 Jan 2026

Purpose

Show to reasonable comfort level that TorchGeo does not have improperly contributed code, and that the code is all under the Project license.


Notice: It is not the goal to be able to prove that every source file, and every contribution to those files, was contributed properly. The onus is not on the OSGeo Incubation Committee to prove there are no problems. However, OSGeo Incubation Committee does need to ensure that OSGeo does not officially endorse code with provenance issues that could have identified and corrected with a reasonable effort.


TorchGeo Provenance Review

Project History

Summarise the history of the source code, including discussion about code ownership and license over time.

Libraries

List or reference a table of included libraries and/or components. Include:

  • Library Name
  • License
  • Whether the library licence is compatible with the project
  • Whether the library is compiled into the project's releases, or required as an external dependancy
  • Whether the library is included in the project's code repository
  • Issue? (link if issue if one exists)
  • Ok? (Yes/No/Black - implies not checked yet)
  • Comment(s)
Library Licence Compatible? Included in core? Issue(s)? Ok? Comments
Library Licence yes yes n/a yes

Source Files

List of source files, including code, documentation and data, including:

  • File name
  • License
  • Copyright Holder (if known)
  • License header included?
  • Issue? (link if issue if one exists)
  • Ok? (Yes/No/Black implies not checked yet)
  • Comment(s)


File Licence Copyright Holder Licenced? Issue(s)? Ok? Comments
File Licence <org/person> yes n/a yes

Copyright Holders

List or reference all copyright holders of the project's source files. These are all the people that would need to be contacted if the license of the project is to be changed. List:

  • Organisation or person
  • Contact details for Organisation or person
  • Issue? (link if issue if one exists)
  • Comments
Copyright Holder Contact details Issue? Comments
<org/person> ...

Committers

List or reference a table of committers, including:

  • Committer name
  • Commit username in code repository
  • Committer's employer(s) (if applicable)
  • Committer (or employer) has signed contribution agreement? (yes/no)

TorchGeo Repository Statistics

==================================================

commit 6995642a4b03ae03834f599dd58a47a04fc91608 Author: Isaac Corley <isaac.corley@proton.me> Date: Sat Jan 3 16:40:34 2026 -0600

-

commit 2330aad537f09ae93e2fa3283d1c25a3e0230703 Author: Adam J. Stewart <ajstewart426@gmail.com> Date: Tue May 11 14:46:50 2021 -0500

   Initial commit

Total Commits: 3357

Contributors:

 1068  Adam J. Stewart
  947  dependabot[bot]
  217  Caleb Robinson
  104  Isaac Corley
  101  Nils Lehmann
   77  Ashwin Nair
   30  isaac
   17  Robin Cole
   15  Yi Wang
   12  Yi-Chia Chang
   10  Burak
    9  Maciej Kilian
    6  Blaž Rolih
    5  Amri Rasyidi,Qiusheng Wu,Ritwik Gupta,Sieger Falkena
    4  Copilot,Daniele Rege Cambrin,Harald Kristen,gatienc
    3  Adrian Tofting,George Huber,Hitesh Tolani,Luca Colomba,Pablo Mandiola,Zhuo Zheng
    2  Blanchon,Buğra,Can Michael Hucko,Data Philosopher,Favyen Bastani,Gaétan Bahl,Janne Mäyrä,Jingtong
    2  Kaustav Mukherjee,SpontaneousDuck,Wei Ji,Yang Mu,estherrolf,remtav,shreyakannan1205
    1  Adrian Boguszewski,Al-Ekram Elahee Hridoy,Aleksey Morozov,Alexandre Lacoste,Alhassan Sahad,Arianna Sole
    1  Bianca Zadrozny,Boe Zhou,Byeong-Hyeok Yu,Codruț Diaconu,Daiki Kimura,David Meaux,Dimitris Mantas
    1  Dominik K,Dylan Stewart,Ethan White (he/him),Federico Bessi,Forrest Williams,Georgia Manou,Giacomo Caria
    1  Heng Fang,HeshamHM28,Ikko Eltociear Ashimine,Inderpreet Singh Parmar,JDilger,Jakub,Kaleb Crans,Kamil Raczycki
    1  Keenan Eves,Kendall Smith,Kiran S,Konrad Heidler,Konstantin Klemmer,Krishnagopal Halder
    1  Laurențiu Nicola,Leonard Waldmann,Mathias Baumgartinger, Matt Gibson,Mauricio Cordeiro,Mehmet Cem Gunturkun
    1  MengluTao,Michael Nolde,Michiaki Tatsubori,Nassim AIT ALI BRAHAM,Night-dev,Odd Eirik Igland,Ohad Nir
    1  Oluwabusayo Jacobs,Piero Toffanin,Piper Wolters,Pitchaporn Likitpanjamanon,Reslan Tinawi,Rijul Dahiya,Rémi Cresson
    1  Rémi Tavon,Saumya Sinha,Shradha Sehgal,Slawek Bierwiaczonek,Tarandeep Singh,Taylor Denouden,Teo Cherici
    1  Veronika Buchauer,XShadow,nands,iejMac,kaybe20, lhackel-tub,lns-lns,lucastao,mdchuc,nsutezo,ntw-au,tritolol

File Changes Summary:

Added lines: 215314

Removed lines: 101198 Total lines changed: 114116

Primarily in Python : 593 dot-py files in 1283 directories

Most Changed Files:

  2769 
   435 requirements/required.txt
   206 .github/workflows/tests.yaml
   179 requirements/style.txt
   166 torchgeo/datasets/__init__.py
   156 setup.cfg
   156 pyproject.toml
   140 requirements/datasets.txt
   137 docs/api/datasets.rst
   136 torchgeo/datasets/geo.py


Committer Username Affiliation Contribution Agrement Signed? Ok?
Name Username Company Yes Yes

Further Issues

List or reference a list of any outstanding issues not already listed above. Typically link to issues in project's issue tracker.



==

Process

The Code Provenance Review is separated into two sections, each with a slightly different process:

  1. Create a Provenance Review Document by copying the Provenance Review Template.
  2. Complete the Library/Component Review Process.
  3. Complete the Code Copyright Review.
  4. Submit the completed Provenance Review Document to the Project Mentor for review.

Library/Component Review Process

For the overall project, document potential Library/Component issues by:

  1. Listing any external components that are included "in the source tree" for the project. For instance, GDAL includes a copy of libtiff, libjpeg, etc in the source tree. It is necessary to be able to identify things like that in the source tree that are under their own distinct license, and are not explicitly vetted by the project team. If practical, it is desirable to remove these internal components and treat them as external dependencies. If kept internal, notes on the licenses of the components should be documented using the Provenance Review Template.
  2. Listing all external dependencies with:
    1. Potentially problematic license terms (all non-free libraries, for instance).
    2. All libraries with licenses that might conflict (for instance GDAL's use of GPL'ed GRASS libraries in the non-GPL GDAL).

Code Copyright Review

For each project source file, perform the following:

  1. Does the file include the license information? If not, add it if there there is no ambiguity about whether the standard project license applies. If that is not obvious, make notes in the review document.
  2. Is the file under the normal project license? If not, make notes in the review document.
  3. Is there anything obviously unusual about the origin of the code? Does this pose any conflicts? Is the issue properly described in the source file? For instance, in GDAL, the gdal/port/cpl_strtod.cpp file is closely derived from external code that was placed in the public domain. cpl_strtod.cpp is placed under the normal GDAL MIT/X license, but detailed notes are kept in the header text on it's origin, the fact that this was public domain and so the fact that it is ok to relicense it. Oddities should be noted in the source file itself and in the review document.
  4. Maintain a list of all copyright holders identified in the Provenance Review Document. This list is essentially everyone who would need to agree to relicense the project. It may be desirable to seek copyright assignment to a "project lead", or to the foundation to reduce the number of copyright holders for the project, although this is not required.

Frequently Asked Questions

Do we need to ensure there is a copyright and licence headers in every file?
Projects are expected to embed a copyright and license header in every source file (.cpp and .h for instance). Machine generated files (such as lex or yacc output for instance) does not require a header if it is inconvenient to embed one, but then an effort should be made to embed a header in the original source file (ie. .l or .y). There is no need to embed copyright and license headers in build files (ie. Makefiles, Visual Studio project files), nor in small documentation fragments (like READMEs). Copyright/license headers should be included in substantial bodies of documentation if it can be done in such a way that it doesn't interfere with the document.
Do I really need to embed the whole license agreement in every file?
No, it is sufficient to refer to a single copy of the license agreement for the project, but each file should include an indication of what the license is, and the location of the full license document. For example:
 /*
  * Copyright (C) 2004-2005  Autodesk, Inc.
  * 
  * This library is free software; you can redistribute it and/or
  * modify it under the terms of version 2.1 of the GNU Lesser
  * General Public License as published by the Free Software Foundation.
  *  
  * This library is distributed in the hope that it will be useful,
  * but WITHOUT ANY WARRANTY; without even the implied warranty of
  * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
  * Lesser General Public License for more details.
  * 
  * You should have received a copy of the GNU Lesser General Public
  * License along with this library; if not, write to the Free Software
  * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
  */

or:

 * COPYRIGHT:    (C) 2001 by the GRASS Development Team
 *
 *               This program is free software under the GNU General Public
 *               License (>=v2). Read the file COPYING that comes with GRASS
 *               for details.
Can I just write a script to check everything?
No. The point of a human review is to ensure that human eyes have looked at the header of each file, looking for any anomalies. For instance it is not uncommon to have additional messages written in headers about the original source of particular code, but not necessarily in a way that a script could recognise.
But I just did a review similar to this a few months ago, do I really need to do it again?
If you believe you have already met or exceeded the rigor described in this review document, then you can present the results of your previous analysis along with an explanation of why you think it still applies. You will need to manually review additions since that earlier review. It will be up to the Incubator (and ultimately the foundation board) to determine if they are satisfied with any review.
Do I need to track down every contributor and ask for permission to use their code?
No. As long as code was knowingly provided by contributors to the project while it was under a clear licensing policy then it is reasonable to assume the project licensing arrangements are accepted unless there is some reason to have doubts. If code was brought into the project from outside sources under ambiguous circumstances then it may be necessary to contact the copyright holder for a legal release, or at least a statement of permission.
Some project code was adapted from other sources with ambiguous copyrights, what do I do?
First, document such concerns in the review document and within the source files themselves. Second, try to obtain a release to use the provide code. An email statement of permission should be sufficient, but pertinant details of such an email should be included in the source file in questions (eg. date, identification of releasing person, text of the release). If no release is possible, but the code isn't very necessary, consider removing or replacing it to simplify things. Otherwise the issue will need to be discussed in the incubator to assess risk.
Should I be looking for patent problems?
No. However, if the code review turns up notes about patent issues - or if there are known patent issues for the project, then these should be noted in the review document. Generally speaking a proper patent review is very difficult and may increase, rather than decrease, legal liability.
Should I be looking for trademark conflicts?
Possibly. It may be prudent to check for trademark conflicts for the project name, or the name of public packages within the project. If issues are found they should be noted in the review document.