Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save holmanb/701b8472e0245c7b6065b840af7bfe70 to your computer and use it in GitHub Desktop.
Save holmanb/701b8472e0245c7b6065b840af7bfe70 to your computer and use it in GitHub Desktop.

Merge conflict represents drift between main and a release branch, which is not represented in a debian/patch and is not represented in the upstream source code.

Our model of merging upstream source code into our release series branches is problematic. It involves carrying an entire copy of the upstream source code in every release branch, which inevitably drifts due to human error, and results in routine unreviewable merges during every release.

An alternative

Imagine for a moment a different release model. Under this different model, the upstream branch would contain no debian/ directory (just like it is today). Downstream series release branches, however, would contain only debian/ directory. Under this different model, a release would happen by generating an orig.tar.gz directly from the tagged upstream release. To build a downstream package, one would cherry-pick the downstream debian/-only release branch onto the tagged release branch, then sbuild the result.

Benefits

  • no longer required to review multi-thousand-line code changes that are effectively unreviewable - no more "did my copy of the tooling do the same thing as yours? if yes, then I guess I have to assume it is correct"
  • We wouldn't have to sync changes (patches, changelog) between multiple branches
  • Easily build a downstream package based on any commit in main without snapshots
  • Single source of truth for every file: drift & merge conflict not possible since merging changes to files is never required.
  • Series release branches don't require incremental snapshots to add patches that can be built/tested
  • less error prone: cant "forget to sync changelog to branch Y, which got flagged during SRU review"

This model would support:

  • new series releases
  • existing series point releases
  • mid-cycle devel releases from arbitrary commits on main
  • downstream hot fixes based on any combination of:
    • upstream fix (YY.Q.N upstream release)
    • downstream-only patch
    • cherry-picked upstream commit into downstream

Building and publishing the deb

An example script that does this: https://github.com/holmanb/uss-tableflip/commit/266cac5e03317e22098ca4c8da166045367d305a

If we structure our repositories such that downstream release branches contain contain only a debian/ directory which can be rebased on upstream main branch, then generating a .deb package could be as simple as:

Generate .orig.tar.gz once for all series releases

git clean -dx --force 
git archive --format=tar.gz -9 --output=../cloud-init_${UPSTREAM_RELEASE_VERSION}.orig.tar.gz $UPSTREAM_RELEASE_COMMITTISH
# Upload this to github

Generate binary package once for each series release:

git checkout $UPSTREAM_RELEASE_COMMITTISH
git cherry-pick ..$DOWNSTREAM_RELEASE_COMMITTISH
sbuild --dist=$RELEASE_SERIES --arch=amd64  --arch-all .
# dput to Launchpad

Note that this doesn't require direct dependency on dpkg-dev or devscripts. Perhaps more interesting is that this does not require use of uss-tableflip's homegrown build-package, get-orig-tarball, or our in-tree tools/make-tarball (of which many projects apparently have their own copy which gets called by uss-tableflip tooling!). That is almost 800 lines of bash code that we just might not need. What does this tooling buy us that is worth maintaining that much bash that in theory gets ran ~4 times per year (8 if you count reviewers duplicating the release process as their "review")?

Example of building a deb from the tip of main

Consider the following example of building a deb package from the tip of main. This example uses an example branch named debian which contains only a debian/ directory.

$ git checkout debian
Switched to branch 'debian'
Your branch is up to date with 'origin/debian'.
$ sed -i '1s/.*/cloud-init (24.1) noble; urgency=medium/' debian/changelog 
$ git commit -m "example release" debian/changelog 
[debian fa618dd81] example release
 1 file changed, 1 insertion(+), 1 deletion(-)
$ git checkout main
Switched to branch 'main'
Your branch is up to date with 'upstream/main'.
$ git clean -dx --force 
$ git archive --format=tar.gz -9 --output=../cloud-init_24.1.orig.tar.gz HEAD
$ git cherry-pick ..debian > /dev/null
$ sbuild --dist=noble --arch=amd64  --arch-all .
$ ls -1 ../cloud-init_24.1*
../cloud-init_24.1_all.deb
../cloud-init_24.1_amd64-2024-02-14T14:37:46Z.build
../cloud-init_24.1_amd64.build
../cloud-init_24.1_amd64.buildinfo
../cloud-init_24.1_amd64.changes
../cloud-init_24.1_amd64_translations.tar.gz
../cloud-init_24.1.debian.tar.xz
../cloud-init_24.1.dsc
../cloud-init_24.1.orig.tar.gz

New Release

  1. generate upstream changelog
  2. tag upstream release (24.1)
  3. tag downstream releases (ubuntu/noble-24.1)
  4. Generate orig.tar.gz
  5. Upload orig.tar.gz to github
  6. sbuild deb
  7. dput deb

State Transitions

Cloud-init team does many different things that fall into the category of "do a release". This means that there is a flow between states that sometimes has different entry points.

Current state Next state Description
New upstream release New downstream release Quarterly upstream release -> Downstream release
Upstream point release New downstream release We need to fix something in the upstream release before downstream was released
Upstream point release Hotfix "bump" downstream release (no downstream changes) We already released downstream, but we need to re-release with a fix in upstream
Upstream point release Hotfix "fix" downstream release (downstream changes) We already released downstream, but we need to re-release downstream based on an upstream fix with an additional downstream patch
New downstream release Hotfix "fix" downstream release We already released downstream, but we need to re-release with a new patch for an Ubuntu-only issue
Any New series A new Ubuntu release has arrived: 24.0{4,10}
New series New downstream release First SRU release into the new series
New downstream release Daily build Not really a "release", but we need to support building "daily" packagees from tip of main

Hotfix Releases

Hotfix branches would only be required if we need to cherry pick an upstream commit from main into a downstream release: a fix that is expected to be a long lived delta from upstream would be built from a patch in the downstream release packaging branch (a new downstream packaging tag against the original upstream release), and a new downstream release based on a upstream point release would just get built using the new point release on main.

Daily builds

Daily builds for each series would get built from the downstream release branches, not hotfix release branches.

@holmanb
Copy link
Author

holmanb commented Feb 21, 2024

@TheRealFalcon Thanks for taking a look at this and sharing feedback. I have some iteration to do per your comments, but I have a few initial responses to comments and questions.

We'll have that as long as we have hotfix branches, and this proposal wouldn't remove the need for hotfix branches, correct?

Correct. I think my comments about changelog/ patch syncing were a bit too optimistic and hand-wavy. I'll update this document to better document what I envision for a hotfix scenario.

Is the git clean necessary for making the tarball?

I don't believe so. If I recall correctly, this was for using sbuild to build in the local directory.

joking aside, I don't think that these scripts are a problem. I think the problem is that we learned to do releases using them without having any understanding of the underlying commands they are calling. Then the scripts had bugs or shortcomings and we had no idea what to do when or why they were coded that way.

We agree that relying on tooling that you don't understand is a problem, but I think I take a firmer stance on the tooling being a problem. Understanding what the scripts are doing is challenging, in part because they are bash, in part because every time we get around to fixing an issue the fix goes mostly untested for long enough to forget what the problem/solution even was, and in part because they are untested so I have little confidence that what they are supposed to do is what they actually do.

I think that new_upstream_snapshot.py was a big step forward in terms of more maintainable and understandable tooling, and I still very much think that it has a place in our packaging toolbox. I really don't want to maintain a pile of bash scripts, and that means that I really don't want to use them either, since I end up maintaining (and therefore attempting to understand) what I use.

If we documented a more "raw" process, and then said "btw, these scripts can help you accomplish that", I think things would be better.

I put together this document to demonstrate just how little we actually "need" to build a deb that we could release. Maybe such a document is what this will turn into, if this proposal is rejected.

updating the changelog for upstream snapshots (we'll still need to record those in the changelog...it just won't require any source merging

For releases all we should need is a single line that says * New upstream release. For devel releases all we should need is * Development release or similar. The version string will contain the release number. Why would we need anything else?

but I suspect we'll still have times when we'll be unable to reproduce the orig tarball exactly as it was uploaded (e.g., somebody passed a different compression number or tags have changed etc) and this is useful for those cases.

Is this really a use case we want to support? If we don't even know how to re-create a tarball, do we really want to push a release from it? This "convenience" might save you from having to burn an extra release number, sure. However, doing this prevents us from reproducing the package from the source. Bumping the version number is just as much a fix for this "problem" as syncing tarballs, and doing that by default would actually force us to follow a release process that maps source to release.

Since this "problem" will only happen when someone has both a) not followed release process and b) will possibly ruin our ability to build from source, I'm not convinced it's something we even want to support.

build-package

I personally prefer keeping the temporary directory. Running an errant git clean -fdx on the wrong branch (like your current debian branch (ask me how I know 😉 )) can wipe out files you don't want removed.

Sure, agreed.

Sidenote: it might also be best for us to release from temporary clones of upstream rather than personal development directories. This would help a lot with reproducibility - such as by preventing us from accidentally building from a local tag that never gets pushed (I've made that mistake a couple of times, and I assume I'm not alone). This should also help prevent issues with not knowing how to re-create a tarball mentioned above, except in the case that someone overwrote a release tag in upstream or in the case that someone deviated from the release process.

I think it could be as simple as debuild -s -S -nc before the sbuild assuming you have an orig tarball in the right place.

+1

@TheRealFalcon
Copy link

TheRealFalcon commented Feb 22, 2024

I think that new_upstream_snapshot.py was a big step forward in terms of more maintainable and understandable tooling, and I still very much think that it has a place in our packaging toolbox. I really don't want to maintain a pile of bash scripts, and that means that I really don't want to use them either, since I end up maintaining (and therefore attempting to understand) what I use.

I generally agree, but I also think new_upstream_snapshot.py (and its predecessor) is unique in how many use cases it is trying to cover. Scripts like make-tarball, get-orig-tarball, and build-package are conceptually much simpler and less prone to errors.

For releases all we should need is a single line that says * New upstream release. For devel releases all we should need is * Development release or similar. The version string will contain the release number. Why would we need anything else?

Because the changelog needs to accurately track everything that has changed. Were you around for the old days of putting everything in d/changelog? It's a lot simpler now, but we'll still need the reference to the upstream release/commit and a link to the upstream changelog when we do an upstream snapshot. With the new repository structure, we'll remove the need for the random merge-conflict upstream snapshots during development, but every release still needs its own upstream snapshot. I don't forsee the changelog entries for that changing at all.

Is this really a use case we want to support? If we don't even know how to re-create a tarball, do we really want to push a release from it? This "convenience" might save you from having to burn an extra release number, sure. However, doing this prevents us from reproducing the package from the source. Bumping the version number is just as much a fix for this "problem" as syncing tarballs, and doing that by default would actually force us to follow a release process that maps source to release.

I don't think we really have a choice. Bumping the version number isn't a solution. If we release 24.1 and then make a downstream change that we need to release separately, we must build it against the 24.1 orig tarball. We can't dput without it, and we can't bump an upstream number for a downstream change. In practice, I think this will be very rare given how rarely we bump downstream without an upstream release and how it won't be possible to carry upstream source changes into downstream branches. I don't think the script would need to be part of our standard release process, but I wouldn't want to nuke it either.

it might also be best for us to release from temporary clones of upstream rather than personal development directories.

+1. I've also considered the idea of having the debian directories in a separate repo entirely, but there may be some additional tradeoffs there.

@holmanb
Copy link
Author

holmanb commented Feb 27, 2024

Were you around for the old days of putting everything in d/changelog?

I think that all of the releases that I've done included everything in the changelog. That was a very recent change. You and I pushed for this together at a sprint not very long ago.

Because the changelog needs to accurately track everything that has changed.

I'm aware of the requirement, but I still fail to understand why that requirement is interpreted as "we need a release/commit and a link to an upstream changelog".

It's a lot simpler now, but we'll still need the reference to the upstream release/commit and a link to the upstream changelog when we do an upstream snapshot.

Why? Some deb packages that I've looked at don't include this extra information. Are they doing it wrong?

This seems unnecessary. Upstream releases will be found here. The updated changelog is in the upstream release tarball. Why do we need a link / hash / release? The information required to get to an upstream changelog is in the upstream version of the release. All one will need to do is check the upstream release number, then grab and unpack the corresponding upstream tarball from the release location. A line such as * New upstream version should tell the changelog reader what they need to know to get to the upstream changelog - which in my opinion should fill the requirement of "accurately tracking everything that has changed".

I might be wrong - probably am. But if so, then I want to know why other packages are held to a different standard than cloud-init. This "requirement" is still duplicating information that doesn't appear necessary (and some packages clearly get away it). If we can just "do less", that's less burden than building and maintaining automated tooling, right?

Is this really a use case we want to support? If we don't even know how to re-create a tarball, do we really want to push a release from it? This "convenience" might save you from having to burn an extra release number, sure. However, doing this prevents us from reproducing the package from the source. Bumping the version number is just as much a fix for this "problem" as syncing tarballs, and doing that by default would actually force us to follow a release process that maps source to release.

I don't think we really have a choice. Bumping the version number isn't a solution. If we release 24.1 and then make a downstream change that we need to release separately, we must build it against the 24.1 orig tarball. We can't dput without it, and we can't bump an upstream number for a downstream change.

Fair, I guess we might need it sometimes. I'm still not convinced that automatically grabbing it as part of the workflow is the best. Having automation for a release manager to break reproducibility automatically by default just doesn't seem right.

In practice, I think this will be very rare given how rarely we bump downstream without an upstream release and how it won't be possible to carry upstream source changes into downstream branches. I don't think the script would need to be part of our standard release process, but I wouldn't want to nuke it either.

+1

+1. I've also considered the idea of having the debian directories in a separate repo entirely, but there may be some additional tradeoffs there.

Same, and agreed. It would force a stronger separation between upstream and downstream, but I don't see a ton of benefit otherwise.

@TheRealFalcon
Copy link

I'm aware of the requirement, but I still fail to understand why that requirement is interpreted as "we need a release/commit and a link to an upstream changelog".

It's what Robbie agreed to. Obviously he's not the only voice that matters, but if you think we need less, it's worth asking Christian and/or the SRU reviewers in the server team. I can't find a hard and fast rule about what exactly is required in the changelog for upstream snapshots.

Some deb packages that I've looked at don't include this extra information. Are they doing it wrong?

Got examples? I'm curious what they look like.

If we can just "do less", that's less burden than building and maintaining automated tooling, right?

Yes, but I also want our changelog to be helpful. "Go find the upstream version on your own and then look at that version's ChangeLog" doesn't seem incredibly intuitive to me...though probably something your average person inspecting a debian/changelog would be able to do. On the other hand, I don't think a link/commit hash is really much more tooling than one to add * New upstream version. At the end of the day, I don't have super strong opinions, but I'd lean towards keeping it.

@holmanb
Copy link
Author

holmanb commented Feb 27, 2024

Some deb packages that I've looked at don't include this extra information. Are they doing it wrong?

Got examples? I'm curious what they look like.

I bumped into this one while filing a bug report recently. I've seen others but don't recall at the moment.

At the end of the day, I don't have super strong opinions, but I'd lean towards keeping it.

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment