Last active
July 31, 2024 20:53
-
-
Save mara004/881d0c5a99b8444fd5d1d21a333b70f8 to your computer and use it in GitHub Desktop.
Parse pdfbox versions
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# SPDX-FileCopyrightText: 2024 geisserml <[email protected]> | |
# SPDX-License-Identifier: Apache-2.0 | |
import re | |
from datetime import datetime | |
from urllib.request import urlopen | |
from packaging.version import Version as PypaVersion | |
PB_RELEASE_URL = "https://archive.apache.org/dist/pdfbox/" | |
PB_DISTS_RE = r'<a href="([\d\.]+.+?)/">.+</a>\s+([\d\-]+ [\d:]+)' | |
PB_DATE_FMT = r"%Y-%m-%d %H:%M" | |
class PdfboxVersion (PypaVersion): | |
def __init__(self, version, date): | |
super().__init__(version) | |
self.date = date | |
# prioritize date over pre-release tags because pdfbox uses them inconsistently, and pre-releases will not get backports | |
# (indices 0, 1 are epoch and release, the rest follows) | |
self._key = (*self._key[:2], date, *self._key[2:]) | |
def __repr__(self): | |
return f"PdfboxVersion({super().__str__()!r}, {self.date!r})" | |
def __str__(self): | |
return f"{super().__str__():<10} {self.date}" | |
content = urlopen(PB_RELEASE_URL).read().decode("utf-8") | |
results = [PdfboxVersion(m.group(1), datetime.strptime(m.group(2), PB_DATE_FMT)) for m in re.finditer(PB_DISTS_RE, content)] | |
results.sort() | |
if __name__ == "__main__": | |
print(*results, sep="\n") |
Another problem: If a backport were made to a minor release series, like so,
a.b.0 2024-07-01
a.c.0 2024-07-02
a.b.1 2024-07-03
then the above would produce the wrong order.
So sorting just by version might be better after all, we'd just need to resolve the v3 RC
/alpha
situation somehow:
[3.0.0-RC1/] 2021-04-01 21:18
[3.0.0-alpha2/] 2021-09-11 11:54
[3.0.0-alpha3/] 2022-06-17 12:27
Updated the code yet again. This time by inheriting from packaging's Version
class and hooking date into the compare key.
While that addresses the above issue, it might be a bit wonky, because technically that's private API...
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I updated the code above; should work now.
Output as of 2024-07-27 (click to expand)