REUSE and Git* Forges

REUSE is great to handle complex dependencies with many licenses. But Gitlab and friends expect a LICENSE file to extract license information.

  1. Is there a preferred way to handle this and either mark the ‘main’ license from Git* forges, and/or point to the LICENSES/ directory?

  2. Are there plans to have these Git* forges understand REUSE?

Maybe I should have mentioned @max.mehl.

  1. Is there a preferred way to handle this and either mark the ‘main’ license from Git* forges, and/or point to the LICENSES/ directory?

The concept of main licenses does not really fit in REUSE’s world, see a
recent discussion in this issue ^1.

However, you can of course keep a LICENSE or COPYING file at the root of
your repo. It will be ignored by REUSE (and its tool). However, please
note that this “main license” can be misleading, so use it carefully.

  1. Are there plans to have these Git* forges understand REUSE?

That would be great, but unfortunately resonance from Github and Gitlab
is low, partly because of public pressure, partly because concrete
solutions are missing. E.g. see ^2.

It would be great it people could integrate the LICENSES directory as a
basis for display in Free Software like Gitea. Unfortunately, the REUSE
project and the FSFE currently don’t have specific plans and resources
for this.

I feel you Max.

Right, in my case it does make sense since the code is AGPL-3.0-or-later, and the docs are LAL-1.3 and CC0-1.0 for “insignificant files”. So here the ‘main’ license would be the one for the code. I guess I could create a LICENSE file that tells licensee to handle it as AGPL – would require to figure out how it does that.

I was thinking of something like this, when it makes sense to do so, or maybe something that points to the LICENSES directory – which would certainly require a patch to licensee.

Maybe starting a discussion here on how that could work can be useful.

Also, NGI Assure might be a good way to keep this development going.

After a few tests I found that adding <!-- License: AGPL-3.0-or-later --> in a README.md makes licensee match the license, but only if targeted at the file:

bundle exec licensee detect ~/path/to/src/README.md
 License:        AGPL-3.0
Matched files:  README.md
README.md:
  Content hash:  6d1a07884aaae3bf44ac968de6454b4700f7521f
  Confidence:    90.00%
  Matcher:       Licensee::Matchers::Reference
  License:       AGPL-3.0
  Closest non-matching licenses:
    MIT similarity:         30.60%
    Unlicense similarity:   28.95%
    PostgreSQL similarity:  27.55%

But when directed at the enclosing folder, it does not detect anything. Using the LICENSE or LICENSE.txt files is not working in the file-matching case. Weird.

Weirder: if using the LICENSE.spdx to force the SPDX matcher, MIT is correctly detected on file target, but variants of *GPL* are not. :thinking:

@how thanks for bringing it up here, you get met thinking…

Having recently used GitHub Insights and having experience using software packages, it is clear to me that the default is to have a single license per ‘artifact’, whether git repository (or rather its contents) or software package. This assumption is used by many software solutions, like Snyk (also used for LF Insights), GitHub and GitLab, and probably more. This assumption is used to provide insight in licenses of software projects, dependencies and in transient dependencies.

So as @max.mehl experiences, trying to instill the idea that more licenses might be involved is rowing against the tide. There are more and more software solutions using this assumption of a single license, so changing that base assumption will only get more difficult. Accepting this as the default, there is practical value in providing a single license per ‘artifact’ in that it would work well with existing software, this besides the theoretical value of having a neat summary on how to deal with the software.

Now to REUSE, is there a way to deal with this. Like @max.mehl I too believe a simple ‘select top license’ is not the right way. In a code repository you might have some code combined and some documentation to go with it: say Apache-2.0 licensed code together with GPL-3.0-or-later and some documentation licensed GFDL-1.3-or-later. It would make sense to select the GPL-3.0-or-later as the ‘main’ license (ignoring the adopted code which has a compatible license and ignoring the documentation license) and put it in the LICENSE file. Not because it is the all-encompassing truth, but because it is a convenient summary for practical use.

I can see REUSE doing 2 things:

  1. Make sure that the license used in LICENSE is actually represented in the code, and not an entirely different license. (I think it would already be difficult to apply an entirely different license, but I’m not entirely sure how all edge-cases are currently handled).
  2. Offer a prompt to the user to select one of the found licenses and apply it as the main LICENSE. This would be similar to adding a license property to the library file, like a package.json or setup.py.

But to keep things simple and preventing feature creep, REUSE could also explain in words that there is value in having a LICENSE file, but it being out of scope for the REUSE tool.

Thank you @nicorikken for sharing your thoughts. To be clear, I was not suggesting that REUSE should adopt the one-license style, but that there might be a way, given the current situation, to trick the licensee software into matching one license or another instead of having the repository appear with no licensing information at all.

The best course of action would be to patch the licensee software to use the LICENSES directory – it currently seems to support the LICENSE either as file or directory. It would then be possible for forges to adapt their code to say: “This is REUSE-compliant software” with a link to the LICENSES directory or something more useful than the current situation.

This is a service run by Free Software Foundation Europe (FSFE). Imprint & Privacy