Properly attributing open source library authorship

hubertlepicki · July 27, 2020, 8:23am

I have a question about what’s best to do, and I suspect this question may be better answered by individual library authors (in this case @josevalim, @mcrumm, @archdragon, @chrismccord and others), and I could have asked privately but it could be useful to others too in future so asking here.

I really like the set up for the infrastructure, for the tests, assets etc. of Phoenix LiveDashboard. If I wanted to use that code in my (open source or otherwise) library, that has completely different UI, logic etc., I could do several things:

Start fresh and copy over files, retaining the copyright notice. I think that’s minimum what’s required by MIT license.
Fork the repository/copy it over retaining Git history, make a commit that removes everything excpet bits I need, retain copyright notice.
Do what 2) does but maybe squash the whole history into single commit.
Add attribution & copyright notice not (or not only) on the README but in the copied files themselves.

My main concern with doing 1 is that usually the full list of contributors is not included in the copyright notice, and yet they hold copyright for their contributions and somewhat deserve attribution. Git commit history preserves that.

But then, if someone looked at the library that’s a fork and has hundreds of commits from @josevalim, while he actually has nothing to do with the library itself and never ever considered it, people start poking him over GH Issues to answer questions etc. as if he was the main author, something that surely is not desired either.

How do you think one should proceed in such case, when you want to take just the bits of the library (or libraries) and use it for set up / infrastructure for your own library, rather than taking and forking the core functionality of the library itself?

Gazler · July 27, 2020, 9:05am

If the project was an actual fork to take LiveDashboard in a different direction, then I’d fork it, however this isn’t the case here.

If it was one of my projects you wanted to take code from, I’d prefer option 1, with a mention of the project the files were copied from in the README. Use the copied code as base commit and reference the source project and commit SHA in the commit message. This way, if anyone uses git blame or git bisect they have a path to follow to locate the original the source of the code once they reach the end of the trail in your new project.

shanesveller · July 27, 2020, 10:37am

A curveball suggestion - you can use git-filter-branch to winnow down the history of the existing repo to just the files you care about, which will have their full history and authorship intact once completed, but any other content omitted. Commit IDs will change but messages and order should not. That detail does undermine the specific benefit @gazler mentioned, correlating cross-repo history. Perhaps add an empty commit at the end with a message to indicate the origin of the upstream commit.

Then you could use those as the base commits for your new repo. If you want to use it with one of the generators, a non-obvious trick is that you can do the above and then move that .git directory into the generated project and carefully stage the new content, reconciling the existing and new history into a consistent place.

hauleth · July 27, 2020, 6:15pm

is what you are legally obliged to do. Question is your morality.

About copying vs fork - depends on what you want to achieve. If it is meant to be soft-fork that is meant to be merged into mainline one day, then I would use GitHub Fork. If it is meant to be an implementation with different API and few other features - go with hard-fork.

For example of such project that hard-forked for adding few bells and whistles, while keeping a lot of patches - see NeoVim vs Vim. Of course you do not need to retain whole history in your fork, but you are free to do whatever you want, as long as you mention the original authors (at least legally).

sb8244 · July 27, 2020, 8:23pm

I think it’s important to not have commits of the original author in this case. The copyright must stay, but the commits may imply that they were for your project, which would mislead people. “Oh neat, José worked on this library!” would be incorrect.

That would lead me to favoring 1 or 3 (which is essentially 1 with more steps?). I wouldn’t apply copyright only to the specific files, because it’s not clear what the licensing of the entire project is.