Last active
July 31, 2024 20:53
-
-
Save mara004/881d0c5a99b8444fd5d1d21a333b70f8 to your computer and use it in GitHub Desktop.
Parse pdfbox versions
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# SPDX-FileCopyrightText: 2024 geisserml <[email protected]> | |
# SPDX-License-Identifier: Apache-2.0 | |
import re | |
from datetime import datetime | |
from urllib.request import urlopen | |
from packaging.version import Version as PypaVersion | |
PB_RELEASE_URL = "https://archive.apache.org/dist/pdfbox/" | |
PB_DISTS_RE = r'<a href="([\d\.]+.+?)/">.+</a>\s+([\d\-]+ [\d:]+)' | |
PB_DATE_FMT = r"%Y-%m-%d %H:%M" | |
class PdfboxVersion (PypaVersion): | |
def __init__(self, version, date): | |
super().__init__(version) | |
self.date = date | |
# prioritize date over pre-release tags because pdfbox uses them inconsistently, and pre-releases will not get backports | |
# (indices 0, 1 are epoch and release, the rest follows) | |
self._key = (*self._key[:2], date, *self._key[2:]) | |
def __repr__(self): | |
return f"PdfboxVersion({super().__str__()!r}, {self.date!r})" | |
def __str__(self): | |
return f"{super().__str__():<10} {self.date}" | |
content = urlopen(PB_RELEASE_URL).read().decode("utf-8") | |
results = [PdfboxVersion(m.group(1), datetime.strptime(m.group(2), PB_DATE_FMT)) for m in re.finditer(PB_DISTS_RE, content)] | |
results.sort() | |
if __name__ == "__main__": | |
print(*results, sep="\n") |
I updated the code above; should work now.
Output as of 2024-07-27 (click to expand)
1
2010-03-29 10:12:00 1.1.0
2010-06-28 14:18:00 1.2.0
2010-07-09 10:13:00 1.2.1
2010-10-25 10:49:00 1.3.1
2010-12-20 10:05:00 1.4.0
2011-03-03 08:50:00 1.5.0
2011-07-01 19:20:00 1.6.0
2012-05-28 20:13:00 1.7.0
2012-07-24 20:53:00 1.7.1
2013-03-22 22:24:00 1.8.0
2013-04-10 15:43:00 1.8.1
2013-06-01 21:57:00 1.8.2
2013-11-28 20:30:00 1.8.3
2014-01-30 18:40:00 1.8.4
2014-05-01 18:59:00 1.8.5
2014-06-22 13:40:00 1.8.6
2015-10-14 16:26:00 1.8.7
2015-10-14 16:26:00 1.8.8
2015-10-14 16:26:00 1.8.9
2015-10-14 16:26:00 1.8.10
2016-01-17 21:55:00 1.8.11
2016-04-25 17:02:00 1.8.12
2017-10-04 11:08:00 1.8.13
2018-05-04 15:48:00 1.8.14
2018-06-28 19:24:00 1.8.15
2022-06-17 12:27:00 1.8.16
2022-09-15 17:13:00 1.8.17
2
2015-10-18 20:57:00 2.0.0rc1
2015-11-21 18:57:00 2.0.0rc2
2016-01-14 20:55:00 2.0.0rc3
2016-03-18 12:02:00 2.0.0
2016-04-25 17:23:00 2.0.1
2016-06-09 17:51:00 2.0.2
2016-09-17 09:15:00 2.0.3
2016-12-15 18:02:00 2.0.4
2017-06-26 17:52:00 2.0.5
2017-06-26 17:52:00 2.0.6
2017-10-04 11:08:00 2.0.7
2017-11-02 20:53:00 2.0.8
2018-05-04 15:48:00 2.0.9
2018-06-21 20:04:00 2.0.10
2018-06-28 19:38:00 2.0.11
2018-10-04 18:43:00 2.0.12
2018-11-30 22:31:00 2.0.13
2019-02-28 17:28:00 2.0.14
2019-04-11 15:36:00 2.0.15
2019-06-27 18:20:00 2.0.16
2019-09-20 18:36:00 2.0.17
2019-12-23 18:33:00 2.0.18
2020-02-23 17:50:00 2.0.19
2020-06-07 16:08:00 2.0.20
2020-11-05 18:56:00 2.0.21
2020-12-19 18:33:00 2.0.22
2021-03-18 21:25:00 2.0.23
2021-06-10 17:57:00 2.0.24
2021-12-16 20:50:00 2.0.25
2022-06-17 12:27:00 2.0.26
2022-09-29 15:52:00 2.0.27
2023-04-13 14:37:00 2.0.28
2023-07-01 17:00:00 2.0.29
2023-11-05 11:10:00 2.0.30
2024-03-24 18:04:00 2.0.31
2024-07-24 15:41:00 2.0.32
3
2021-04-01 21:18:00 3.0.0rc1
2021-09-11 11:54:00 3.0.0a2
2022-06-17 12:27:00 3.0.0a3
2023-07-14 06:07:00 3.0.0b1
2023-08-18 04:31:00 3.0.0
2023-11-30 18:47:00 3.0.1
2024-03-14 20:33:00 3.0.2
(datetime.datetime(2024, 3, 14, 20, 33), <Version('3.0.2')>)
Another problem: If a backport were made to a minor release series, like so,
a.b.0 2024-07-01
a.c.0 2024-07-02
a.b.1 2024-07-03
then the above would produce the wrong order.
So sorting just by version might be better after all, we'd just need to resolve the v3 RC
/alpha
situation somehow:
[3.0.0-RC1/] 2021-04-01 21:18
[3.0.0-alpha2/] 2021-09-11 11:54
[3.0.0-alpha3/] 2022-06-17 12:27
Updated the code yet again. This time by inheriting from packaging's Version
class and hooking date into the compare key.
While that addresses the above issue, it might be a bit wonky, because technically that's private API...
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Turns out sorting by major versions + date alone is not sufficient.
If multiple releases have been made the same day and there is a transition from 1 -> 2 digits, then ordering goes amiss:
On the other hand, sorting by version alone also tends to go wrong where pre-release annots are involved:
So I suppose we need a combination of both (something like: pre-sort by version, then sort by date).