Skip to content

Instantly share code, notes, and snippets.

@mara004
Last active July 31, 2024 20:53
Show Gist options
  • Save mara004/881d0c5a99b8444fd5d1d21a333b70f8 to your computer and use it in GitHub Desktop.
Save mara004/881d0c5a99b8444fd5d1d21a333b70f8 to your computer and use it in GitHub Desktop.
Parse pdfbox versions
# SPDX-FileCopyrightText: 2024 geisserml <[email protected]>
# SPDX-License-Identifier: Apache-2.0
import re
from datetime import datetime
from urllib.request import urlopen
from packaging.version import Version as PypaVersion
PB_RELEASE_URL = "https://archive.apache.org/dist/pdfbox/"
PB_DISTS_RE = r'<a href="([\d\.]+.+?)/">.+</a>\s+([\d\-]+ [\d:]+)'
PB_DATE_FMT = r"%Y-%m-%d %H:%M"
class PdfboxVersion (PypaVersion):
def __init__(self, version, date):
super().__init__(version)
self.date = date
# prioritize date over pre-release tags because pdfbox uses them inconsistently, and pre-releases will not get backports
# (indices 0, 1 are epoch and release, the rest follows)
self._key = (*self._key[:2], date, *self._key[2:])
def __repr__(self):
return f"PdfboxVersion({super().__str__()!r}, {self.date!r})"
def __str__(self):
return f"{super().__str__():<10} {self.date}"
content = urlopen(PB_RELEASE_URL).read().decode("utf-8")
results = [PdfboxVersion(m.group(1), datetime.strptime(m.group(2), PB_DATE_FMT)) for m in re.finditer(PB_DISTS_RE, content)]
results.sort()
if __name__ == "__main__":
print(*results, sep="\n")
@mara004
Copy link
Author

mara004 commented Jul 27, 2024

Another problem: If a backport were made to a minor release series, like so,

a.b.0  2024-07-01
a.c.0  2024-07-02
a.b.1  2024-07-03

then the above would produce the wrong order.

So sorting just by version might be better after all, we'd just need to resolve the v3 RC/alpha situation somehow:

[3.0.0-RC1/]     2021-04-01 21:18
[3.0.0-alpha2/]  2021-09-11 11:54
[3.0.0-alpha3/]  2022-06-17 12:27

@mara004
Copy link
Author

mara004 commented Jul 27, 2024

Updated the code yet again. This time by inheriting from packaging's Version class and hooking date into the compare key.
While that addresses the above issue, it might be a bit wonky, because technically that's private API...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment