Last active
August 12, 2024 19:37
-
-
Save sunflsks/a03419a68bb091f84a0c10bd1659679e to your computer and use it in GitHub Desktop.
Spotify Private API exploration: /v2/recently_played
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# #!/usr/bin/env python3 | |
''' | |
While working on my custom scrobbler program that utilizes the Spotify API, I noticed a limitation that many | |
others have also come across | |
(https://stackoverflow.com/questions/73240867/is-there-a-way-to-retrieve-more-than-50-recently-played-tracks-using-spotipy, | |
https://stackoverflow.com/questions/74190136/is-there-a-way-to-get-my-full-listening-history-from-the-spotify-api); | |
the public recently_played API seems to be limited to the 50 most recent tracks, no matter what. However, the spotify | |
app itself has a built-in Listening History feature that uses a private API to go further back in time. Curious (and with | |
some time to kill), I decided to look a bit into this API and see what it provides. | |
The API (endpoint shown below) is not a normal API (in the sense that it returns JSON models, which are | |
then parsed by the frontend and shown as seen fit); instead, it seems to be Spotify's own implementation of a server-driven | |
UI. The UI elements themselves are sent over the wire and presented (semi-)directly to the user. This allows for a lot | |
of flexibility in terms of how the UI itself is drawn (dividers between dates, rows, what information is presented for each song, etc) | |
A really interesting design pattern that I had no idea about until approximately 3 hours ago! | |
However this makes it more difficult for me, as the actual data itself is not sent through the API, only the elements deemed | |
fit to be displayed to the user. This means that only the date is sent, not the actual timestamp for each song (as the Listening | |
History pane only shows the YYYY-MM-DD date). To implement paging, a timestamp (of what I assume is the last played song sent) | |
is appended to the JSON; this timestamp is then used in the next request, so on and so forth. | |
Example Response: | |
{ | |
"title": "Recently played", | |
body": [ | |
... | |
{ | |
"id": "2024-08-12-spotify:track:1uTeYqdZf9oYwgkhE0hlf0-spotify:track:1uTeYqdZf9oYwgkhE0hlf0", | |
"component": { | |
"id": "listeninghistory:trackRow", | |
"category": "row" | |
}, | |
"text": { | |
"title": "I Don't Miss You at All" | |
}, | |
"images": { | |
"main": { | |
"uri": "https://i.scdn.co/image/ab67616d00001e02500f0405c1d3feb14d62849c", | |
"placeholder": "track" | |
} | |
}, | |
"custom": { | |
"has_play_context": false, | |
"artists": [ | |
{ | |
"uri": "spotify:artist:37M5pPGs6V1fchFJSgCguX", | |
"name": "FINNEAS" | |
} | |
] | |
}, | |
"logging": { | |
"ubi:specification_id": "mobile-listening-history", | |
"ubi:app": "music", | |
"ubi:impression": false, | |
"ubi:specification_version": "8.1.0", | |
"ubi:path": [ | |
{ | |
"name": "container" | |
}, | |
{ | |
"name": "contextless_item", | |
"id": "play-entity-2024-08-12", | |
"uri": "spotify:track:1uTeYqdZf9oYwgkhE0hlf0" | |
} | |
], | |
"ubi:generator_version": "11.0.1" | |
}, | |
"metadata": { | |
"creator_name": "FINNEAS", | |
"sectionId": "section-header-2024-08-12", | |
"album_uri": "spotify:album:2b7DunZFOVCs0QgiTI1FJW" | |
}, | |
"events": { | |
"rightAccessoryClick": { | |
"name": "contextMenu", | |
"data": { | |
"uri": "spotify:track:1uTeYqdZf9oYwgkhE0hlf0" | |
} | |
}, | |
"click": { | |
"name": "playFromContext", | |
"data": { | |
"uri": "spotify:track:1uTeYqdZf9oYwgkhE0hlf0", | |
"player": { | |
"context": { | |
"pages": [ | |
{ | |
"tracks": [ | |
{ | |
"uri": "spotify:track:1uTeYqdZf9oYwgkhE0hlf0" | |
} | |
] | |
} | |
], | |
"uri": "spotify:track:1uTeYqdZf9oYwgkhE0hlf0" | |
}, | |
"options": { | |
"skip_to": { | |
"track_uri": "spotify:track:1uTeYqdZf9oYwgkhE0hlf0" | |
} | |
} | |
} | |
} | |
} | |
} | |
}, | |
... | |
], | |
"custom": { | |
"last_component_had_play_context": false, | |
"timestamp": 1723334400 | |
} | |
} | |
There are different row IDs for each UI element (dividers, etc). | |
There also seems to be a number of parameters that can be passed to the API call; it might be interesting to look more | |
into these and see what they mean. | |
Example: /listening-history/v2/mobile/0?type=merged&last_component_had_play_context=false&client-timezone=America%2Chicago | |
LIMITATIONS: | |
- This API also seems to be limited to only the past 90 days, a far cry from being able to fetch the entire history of a user. | |
- Given that it's a private API, a client token (which seems to be static? or at least persisting for >3 hours) and | |
an access token are required. Not sure if the normal OAuth tokens can be used for this; I just intercepted the HTTP | |
requests from an android emulator running Spotify to get these values | |
- Again, no proper timestamps for each song played, only the date. A deal-breaker :( | |
''' | |
import re | |
import json | |
import time | |
import requests | |
import datetime | |
from requests import Response | |
AUTH_TOKEN='' | |
CLIENT_TOKEN='' | |
API_URL='https://spclient.wg.spotify.com/listening-history/v2/mobile' | |
timestamps = set() | |
all_songs_played = set() | |
def get_api(timestamp: int) -> dict: | |
headers: dict = { | |
"Authorization": f"Bearer {AUTH_TOKEN}", | |
"client-token": CLIENT_TOKEN, | |
"App-Platform": "Android" | |
} | |
api_url: str = f"{API_URL}/{timestamp}" | |
r: Response = requests.get(api_url, headers=headers) | |
return r.json() | |
def songs_from_json(input_dict: dict) -> (int, list): | |
timestamp = input_dict["custom"]["timestamp"] | |
song_list: list = [] | |
for element in input_dict["body"]: | |
if element["component"]["id"] != "listeninghistory:trackRow": | |
continue | |
song_name: str = element["text"]["title"] | |
played_date: str = re.search(r"\d{4}-\d{2}-\d{2}", element["id"]).group(0) | |
song_list.append((song_name, played_date)) | |
return (timestamp, song_list) | |
def main(): | |
response_dict: dict = get_api(0) | |
timestamp: int, songs_played: list = songs_from_json(response_dict) | |
while True: | |
if timestamp in timestamps: | |
break | |
timestamps.add(timestamp) | |
for song in songs_played: | |
print(f"Found song {song[0]}, played at {song[1]}") | |
all_songs_played.update(songs_played) | |
time.sleep(3) # Don't know what sort of voodoo rate-limiting or blacklisting they may do; playing it safe. | |
timestamp: int, songs_played: list = songs_from_json(get_api(timestamp)) | |
timestamps.remove(0) | |
timediff: int = int(time.time()) - min(timestamps) | |
delta: datetime.timedelta = datetime.timedelta(seconds=timediff) | |
print(f"Found a total of {len(all_songs_played)} songs over a time range of {delta}") | |
if __name__ == "__main__": | |
main() |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment