Last active
June 30, 2023 07:40
-
-
Save tomas789/8fb07dd214430b954080766a1430c9b4 to your computer and use it in GitHub Desktop.
LinkedIn: Extract datetime from post URL.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import datetime | |
from urllib.parse import urlparse | |
def linkedin_post_datetime(url: str) -> datetime.datetime: | |
""" | |
Convert LinkedIN post URL to datetime of when the post was created. | |
You can gen the post URL by clicking tripple dots in upper right corner of the post | |
and then selecting "Copy link to post". | |
Returns a localized URL. | |
Example usage: | |
linkedin_post_datetime("https://www.linkedin.com/posts/tomas789_netzero-renewables-gas-activity-7077221689796784128-dafy") | |
Original Gist URL: https://gist.github.com/tomas789/8fb07dd214430b954080766a1430c9b4 | |
Author: [@tomas789](https://github.com/tomas789) | |
""" | |
# Parse URL | |
url_parts = urlparse(url) | |
# Extract 19 digit post id from the URL | |
_, _, name = tuple(url_parts.path.split("/")) | |
username, post_name = tuple(name.split("_", maxsplit=1)) | |
slug, post_id, _ = tuple(post_name.rsplit("-", maxsplit=2)) | |
# Convert post id to binary, take first 41 bits and convert that to int | |
post_timestamp = int(f"{int(post_id):b}"[:41], 2) | |
# Convert timestamp to datetime object and add timezone. | |
post_datetime = datetime.datetime.fromtimestamp(post_timestamp / 1000).replace(tzinfo=datetime.timezone.utc) | |
return post_datetime |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment