Skip to content

Instantly share code, notes, and snippets.

@lemon24
Last active November 25, 2023 11:50
Show Gist options
  • Save lemon24/93222ef4bc4a775092b56546a6e6cd0f to your computer and use it in GitHub Desktop.
Save lemon24/93222ef4bc4a775092b56546a6e6cd0f to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"id": "f94ee186-033a-4a77-ac04-c99bd2ecf11e",
"metadata": {},
"source": [
"# Feed scoring algorithm (and how I consume feeds)\n",
"\n",
"This is an attempt to use the metrics added in\n",
"[lemon24/reader#254](https://github.com/lemon24/reader/issues/254)\n",
"\"Am I interacting with this feed?\"\n",
"to determine a feed \"usefulness\" score\n",
"based on how many entries I mark as read / important / don't care.\n",
"\n",
"The main use case for this is to unsubscribe from \"low value\" feeds.\n",
"Of course, this can be approximated quite well by sorting by unread / unimportant\n",
"and skipping the few exceptions \"by hand\",\n",
"but I was curious if it can be done in code\n",
"(presumably, this logic would be part of a plugin).\n",
"\n",
"The second use case (for this notebook) is to find gaps in the *reader* API\n",
"that would prevent someone from implementing this on their own."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "6367e289-b7cd-40c0-a493-5fd8da860691",
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"\n",
"pd.options.display.float_format = '{:.2f}'.format\n",
"pd.options.display.max_rows = 1000"
]
},
{
"cell_type": "markdown",
"id": "82b68656-9038-45cc-a9ed-9700fc16c214",
"metadata": {},
"source": [
"## Entry data\n",
"\n",
"We care about:\n",
"\n",
"* entries in a certain time period\n",
"* entries marked as read/important in that period (usually means I'm going through a feed's older entries)\n",
"\n",
"We also need to exclude some entries that would skew the results."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "39f36660-128d-4cfb-8a87-b309e14abeca",
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"3985"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from datetime import datetime, timezone\n",
"from reader import make_reader\n",
"from reader._storage import convert_timestamp\n",
"\n",
"reader = make_reader('db.sqlite') # reader.sqlite.2023-11-14\n",
"\n",
"start = datetime(2022, 11, 14, tzinfo=timezone.utc)\n",
"# start = datetime(2023, 8, 14, tzinfo=timezone.utc)\n",
"end = datetime(2023, 11, 14, tzinfo=timezone.utc)\n",
"\n",
"entries = []\n",
"for e in reader.get_entries():\n",
" date = e.published or e.updated or e.added\n",
" if not (\n",
" # keep entries from (start, end); \n",
" # similar to, but not exactly like #314 \"Get entries added before\"\n",
" start < date < end\n",
"\n",
" # also keep entries read / marked as important in (start, end);\n",
" # similar to, but not exactly like #294 \"Sort by recently interacted with\"\n",
" or e.read and e.read_modified and start < e.read_modified < end\n",
" or e.important and e.important_modified and start < e.important_modified < end\n",
" ):\n",
" continue\n",
"\n",
" # exclude mark_as_read entries\n",
" if e.read and not e.read_modified:\n",
" continue\n",
"\n",
" # exclude old entries from newly-added feeds;\n",
" # neither of the attributes are exposed on Entry ಠ_ಠ\n",
" recent_sort = reader._storage.get_entry_recent_sort(e.resource_id)\n",
" first_updated_epoch = convert_timestamp(\n",
" list(reader._storage.get_db().execute(\n",
" \"select first_updated_epoch from entries where (feed, id) = (?, ?)\",\n",
" e.resource_id\n",
" ))[0][0]\n",
" )\n",
" if recent_sort != first_updated_epoch:\n",
" continue\n",
"\n",
" entries.append(e)\n",
"\n",
"len(entries)"
]
},
{
"cell_type": "markdown",
"id": "fb3ee675-7024-40dd-b1ba-250fcfc3e5c8",
"metadata": {},
"source": [
"We build a dataframe and derive some additional metrics from the raw data."
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "785d0d2d-8c6b-499e-9275-774bf06570b9",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>date</th>\n",
" <th>read</th>\n",
" <th>unread</th>\n",
" <th>important</th>\n",
" <th>unimportant</th>\n",
" <th>read_after</th>\n",
" <th>important_after</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>2023-11-13 00:00:00+00:00</td>\n",
" <td>True</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>0 days 00:25:14.015914</td>\n",
" <td>NaT</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2023-11-13 20:10:14+00:00</td>\n",
" <td>False</td>\n",
" <td>True</td>\n",
" <td>False</td>\n",
" <td>True</td>\n",
" <td>NaT</td>\n",
" <td>NaT</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>2023-11-12 21:40:51+00:00</td>\n",
" <td>False</td>\n",
" <td>True</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>NaT</td>\n",
" <td>NaT</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>2023-11-13 15:40:52+00:00</td>\n",
" <td>True</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>0 days 00:29:16.151919</td>\n",
" <td>NaT</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>2023-11-13 09:00:00+00:00</td>\n",
" <td>False</td>\n",
" <td>True</td>\n",
" <td>False</td>\n",
" <td>True</td>\n",
" <td>NaT</td>\n",
" <td>NaT</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" date read unread important unimportant \\\n",
"0 2023-11-13 00:00:00+00:00 True False False False \n",
"1 2023-11-13 20:10:14+00:00 False True False True \n",
"2 2023-11-12 21:40:51+00:00 False True False False \n",
"3 2023-11-13 15:40:52+00:00 True False False False \n",
"4 2023-11-13 09:00:00+00:00 False True False True \n",
"\n",
" read_after important_after \n",
"0 0 days 00:25:14.015914 NaT \n",
"1 NaT NaT \n",
"2 NaT NaT \n",
"3 0 days 00:29:16.151919 NaT \n",
"4 NaT NaT "
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"attrs = 'feed_url published updated added read read_modified important important_modified'.split()\n",
"df = pd.DataFrame({a: getattr(e, a) for a in attrs} for e in entries)\n",
"\n",
"df['date'] = df.published.combine_first(df.updated).combine_first(df.added)\n",
"\n",
"# for convenience\n",
"df['unread'] = ~df.read\n",
"# important is ternary (bool|None)\n",
"df['unimportant'] = df.important == False\n",
"df['important'] = df.important == True\n",
"\n",
"df['read_after'] = df.read_modified - df.added\n",
"df.loc[df.unread, 'read_after'] = None\n",
"df['important_after'] = df.important_modified - df.added\n",
"df.loc[~df.important, 'important_after'] = None\n",
"\n",
"entries_df = df.reindex(['feed_url', 'date', 'read', 'unread', 'important', 'unimportant', 'read_after', 'important_after'], axis=1)\n",
"\n",
"entries_df.drop(['feed_url'], axis=1).head()"
]
},
{
"cell_type": "markdown",
"id": "d20f0957-93e8-4614-8251-35013fed0530",
"metadata": {},
"source": [
"`read_after` / `important_after` aren't used,\n",
"but I thought some stats may be interesting."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "b80cc159-b6d3-486d-99c5-60ba66477c57",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"count 1924\n",
"mean 3 days 06:31:43.507435104\n",
"std 15 days 20:02:14.124536110\n",
"min 0 days 00:00:00.009099\n",
"25% 0 days 02:15:42.020138500\n",
"50% 0 days 04:41:16.205114\n",
"75% 0 days 13:57:01.578489750\n",
"max 236 days 12:12:38.108927\n",
"Name: read_after, dtype: object"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"entries_df.read_after.describe()"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "4191d50d-29de-4b72-ab30-84f9f822b426",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"count 31\n",
"mean 11 days 04:52:19.160200451\n",
"std 37 days 08:40:34.612888804\n",
"min 0 days 00:00:02.896490\n",
"25% 0 days 04:01:16.004153\n",
"50% 0 days 10:25:41.507134\n",
"75% 1 days 04:04:09.284485500\n",
"max 170 days 03:27:52.418325\n",
"Name: important_after, dtype: object"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"entries_df.important_after.describe()"
]
},
{
"cell_type": "markdown",
"id": "48c00c5f-26c8-43d2-9be9-ba3b136f3f59",
"metadata": {},
"source": [
"## Feed data\n",
"\n",
"First, some helpers."
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "6b93be26-998a-445a-8eee-2f5ea983f18a",
"metadata": {},
"outputs": [],
"source": [
"from functools import cache\n",
"\n",
"@cache\n",
"def get_title(url):\n",
" feed = reader.get_feed(url)\n",
" return feed.user_title or feed.title\n",
"\n",
"@cache\n",
"def get_tags(url):\n",
" return '-'.join(\n",
" t for t in reader.get_tag_keys(url)\n",
" if not (t.startswith('.') or t in {'main', 'reader-related'})\n",
" )\n",
"\n",
"@cache\n",
"def get_counts(url):\n",
" return reader.get_entry_counts(feed=url)\n"
]
},
{
"cell_type": "markdown",
"id": "d9d5cc19-4dc1-4ab1-8abf-963ffcecee73",
"metadata": {},
"source": [
"We sum the various entry counts into a dataframe.\n",
"\n",
"We also get some all-time counts,\n",
"since we care if a feed has important entries,\n",
"even if none were in *period*."
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "ed2a930b-b64a-4de7-9fbe-24e86adbd8e5",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>tags</th>\n",
" <th>total</th>\n",
" <th>read</th>\n",
" <th>unread</th>\n",
" <th>important</th>\n",
" <th>unimportant</th>\n",
" <th>total_all</th>\n",
" <th>read_all</th>\n",
" <th>important_all</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>116</th>\n",
" <td>tech</td>\n",
" <td>10</td>\n",
" <td>9</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>364</td>\n",
" <td>363</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>117</th>\n",
" <td>webcomic</td>\n",
" <td>10</td>\n",
" <td>10</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>37</td>\n",
" <td>37</td>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>118</th>\n",
" <td>webcomic</td>\n",
" <td>158</td>\n",
" <td>154</td>\n",
" <td>4</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>918</td>\n",
" <td>914</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>119</th>\n",
" <td>podcast</td>\n",
" <td>29</td>\n",
" <td>1</td>\n",
" <td>28</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>185</td>\n",
" <td>28</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>120</th>\n",
" <td>tech</td>\n",
" <td>2</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>21</td>\n",
" <td>16</td>\n",
" <td>0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" tags total read unread important unimportant total_all \\\n",
"116 tech 10 9 1 0 0 364 \n",
"117 webcomic 10 10 0 0 0 37 \n",
"118 webcomic 158 154 4 0 1 918 \n",
"119 podcast 29 1 28 0 1 185 \n",
"120 tech 2 1 1 0 0 21 \n",
"\n",
" read_all important_all \n",
"116 363 0 \n",
"117 37 2 \n",
"118 914 1 \n",
"119 28 0 \n",
"120 16 0 "
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df = entries_df.groupby('feed_url').aggregate({\n",
" 'date': 'count',\n",
" 'read': 'sum',\n",
" 'unread': 'sum',\n",
" 'important': 'sum',\n",
" 'unimportant': 'sum',\n",
" # 'read_after': 'mean',\n",
" # 'important_after': 'mean',\n",
"}).rename(columns={'date': 'total'})\n",
"\n",
"df.insert(0, 'feed', df.index.map(get_title))\n",
"df.insert(1, 'tags', df.index.map(get_tags))\n",
"df['total_all'] = df.index.map(lambda u: get_counts(u).total)\n",
"df['read_all'] = df.index.map(lambda u: get_counts(u).read)\n",
"df['important_all'] = df.index.map(lambda u: get_counts(u).important)\n",
"\n",
"feeds_df = df.reset_index(drop=True)\n",
"\n",
"feeds_df.drop(['feed'], axis=1).tail()"
]
},
{
"cell_type": "markdown",
"id": "c1d6d7c5-4fe7-43c3-8e22-50877f9323bf",
"metadata": {},
"source": [
"## Scoring\n",
"\n",
"Here are some relative criteria for scoring:\n",
"\n",
"* important is high even if not read\n",
"* many read of many is higher than one read of few (e.g. 14/17 vs 1/4)\n",
"* none read is much less than one read\n",
"* many unimportant is higher than one read\n",
"* many unimportant is roughly the same as few read\n",
"* many unimportant of many is higher than one unimportant of few\n",
"* important at any point in the past is higher than none important\n",
"* also count read/important in the interval regardless of added\n",
"\n",
"I'm not sure I managed fulfill all of them,\n",
"but the resulting order seems fine.\n",
"\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "0a95596a-2965-45ec-aeee-5f435336bb51",
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"# some helpers\n",
"\n",
"def apply_score(df, fn, **kwargs):\n",
" \"\"\"Add all *_s columns into a score column and return as new df.\"\"\"\n",
" df = df.copy()\n",
" fn(df, **kwargs)\n",
" if 'score' not in df:\n",
" score_cols = df[[c for c in df.columns if c.endswith('_s')]]\n",
" df.insert(len(df.columns) - len(score_cols.columns), 'score', score_cols.sum(axis=1))\n",
" df.sort_values('score', inplace=True, ascending=False)\n",
" df.reset_index(inplace=True, drop=True)\n",
" df.index = df.index.astype(int)\n",
" return df\n",
"\n",
"def plot_percentiles(df, *, size=4):\n",
" s = df.total\n",
" plt.figure(figsize=(size, size))\n",
" xs = range(1, 101)\n",
" ys = [s.head(int(x/100 * len(s))).sum() / s.sum() * 100 for x in xs]\n",
" plt.title('percentiles')\n",
" plt.xlabel('feeds %')\n",
" plt.ylabel('entries %')\n",
" plt.plot(xs, ys)\n",
" plt.grid()\n",
" plt.show()\n",
"\n",
"def plot_breakdown(df, title='score breakdown'):\n",
" plt.figure(figsize=(8, 4))\n",
" plt.title(title)\n",
" plt.xlabel('feeds')\n",
" plt.ylabel('score')\n",
" # plt.plot(df.score, label='score')\n",
" cols = [c for c in df.columns if c.endswith('_s')]\n",
" for col in cols:\n",
" plt.plot(df[col], label=col.removesuffix('_s'))\n",
" plt.legend()\n",
" # plt.yscale('symlog')\n",
" plt.show()"
]
},
{
"cell_type": "markdown",
"id": "62108f72-d087-4f55-8a2a-c48c0017e444",
"metadata": {},
"source": [
"In the end, I only managed to come up with a single strategy:\n",
"\n",
"* `read`, `unread` and `unimportant` are used as-is, with some weights applied.\n",
"* For `important` and `important_old` (added before *period*),\n",
" I scaled the number of entries by a \"how important is important\" factor\n",
" that uses a logarithm to give disproportionately more importance\n",
" to feeds with lots of important entries;\n",
" the logarithm bases were chosen by hand (tweaked until they looked right).\n",
"\n",
"I hope this works for others as well, but it needs to be backtested.\n",
"\n",
"Note that in the code below, the order is reversed (bigger score -> lower feed value).\n",
"The can be fixed and normalized to [0, 1] trivially."
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "63e4c664-a51d-4fbe-af55-514c60d0fa40",
"metadata": {},
"outputs": [],
"source": [
"def entry_counting(df):\n",
" # read matters more than unread\n",
" df['read_s'] = df.read * -2\n",
" df['unread_s'] = df.unread\n",
" # unimportant matters as much as read, and more than unread\n",
" # (but, note a lot of unimportant entries are also unread)\n",
" df['unimportant_s'] = df.unimportant * 2\n",
" # for my data, log2(88) -> 6.5\n",
" important_factor = np.log2(df.total.sum() / df.important.sum())\n",
" df['important_s'] = df.important * - important_factor\n",
" # for my data, log10(104) -> 2\n",
" important_all_factor = np.log10(df.total_all.sum() / df.important_all.sum())\n",
" df['important_old_s'] = (df.important_all - df.important) * - important_all_factor\n",
"\n",
"df = apply_score(feeds_df, entry_counting)\n",
"# unused\n",
"df.drop(['read_all'], axis=1, inplace=True)\n",
"\n",
"df.drop(['feed'], axis=1, inplace=True)"
]
},
{
"cell_type": "markdown",
"id": "5876293f-5b37-4d9c-b4b6-a3deaee4c104",
"metadata": {},
"source": [
"Here's what the distribution looks like: \n",
"\n",
"* 5% of feeds account for 30% of entries\n",
"* 20% of feeds account for 50% of entries\n",
" \n",
"This is expected/desired (noisier feeds are lower value)."
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "788b2c27-a308-4132-bd5e-96739f64ae0c",
"metadata": {},
"outputs": [
{
"data": {
"image/png": "",
"text/plain": [
"<Figure size 400x400 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"plot_percentiles(df)"
]
},
{
"cell_type": "markdown",
"id": "74f3fe00-0d3c-4004-9797-deb9b445ec25",
"metadata": {},
"source": [
"Here's what the score is composed of for each feed\n",
"(note the interesting ones at the ends)."
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "e94d8084-36a3-4637-b5ab-f1bd1a12d579",
"metadata": {},
"outputs": [
{
"data": {
"image/png": "",
"text/plain": [
"<Figure size 800x400 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"plot_breakdown(df)"
]
},
{
"cell_type": "markdown",
"id": "6ba3e161-8cd8-49c1-891d-9e36d5620bc5",
"metadata": {},
"source": [
"Here's some annotated top talkers (\"low value\"):\n",
"\n",
"* `1 webcomic` and `7 podcast` I deliberately started marking as unimportant / ignoring\n",
" * unsubscribe\n",
"* `0` and `2 money-podcast` are high-volume, current events feeds; possible approaches:\n",
" * unsubscribe\n",
" * exclude/include from the default page using a tag\n",
" * automatically mark as don't care after a short time [lemon24/reader#312](https://github.com/lemon24/reader/issues/312)\n",
" * use ML to guess which one I won't care about? :))\n",
"* `3 corp-tech` and `5 corp-tech` are corporate blogs\n",
" * similar treatment to the current event feeds\n",
"* `6 tech` is a feed I don't really care about\n",
" * unsubscribe\n",
"* `9 python-podcast` is a weekly podcast I don't really listen to (I look at the titles sometime)\n",
" * similar treatment to the current event feeds\n",
"* `4 links-tech` is a feed of curated links, some current events; many more read, but higher (highest?) volume\n",
" * similar treatment to the current event feeds\n",
"* `8 podcast` is a weekly podcast I'm slowly making my way through; keep\n",
"\n",
"The results are almost identical (2 different feeds) for 3 months."
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "5eda180d-264c-496f-8a54-021439026457",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>tags</th>\n",
" <th>total</th>\n",
" <th>read</th>\n",
" <th>unread</th>\n",
" <th>important</th>\n",
" <th>unimportant</th>\n",
" <th>total_all</th>\n",
" <th>important_all</th>\n",
" <th>score</th>\n",
" <th>read_s</th>\n",
" <th>unread_s</th>\n",
" <th>unimportant_s</th>\n",
" <th>important_s</th>\n",
" <th>important_old_s</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td></td>\n",
" <td>245</td>\n",
" <td>76</td>\n",
" <td>169</td>\n",
" <td>0</td>\n",
" <td>195</td>\n",
" <td>636</td>\n",
" <td>0</td>\n",
" <td>407.00</td>\n",
" <td>-152</td>\n",
" <td>169</td>\n",
" <td>390</td>\n",
" <td>-0.00</td>\n",
" <td>-0.00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>webcomic</td>\n",
" <td>155</td>\n",
" <td>43</td>\n",
" <td>112</td>\n",
" <td>0</td>\n",
" <td>149</td>\n",
" <td>1286</td>\n",
" <td>0</td>\n",
" <td>324.00</td>\n",
" <td>-86</td>\n",
" <td>112</td>\n",
" <td>298</td>\n",
" <td>-0.00</td>\n",
" <td>-0.00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>money-podcast</td>\n",
" <td>205</td>\n",
" <td>36</td>\n",
" <td>169</td>\n",
" <td>0</td>\n",
" <td>91</td>\n",
" <td>465</td>\n",
" <td>0</td>\n",
" <td>279.00</td>\n",
" <td>-72</td>\n",
" <td>169</td>\n",
" <td>182</td>\n",
" <td>-0.00</td>\n",
" <td>-0.00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>corp-tech</td>\n",
" <td>104</td>\n",
" <td>17</td>\n",
" <td>87</td>\n",
" <td>0</td>\n",
" <td>63</td>\n",
" <td>656</td>\n",
" <td>0</td>\n",
" <td>179.00</td>\n",
" <td>-34</td>\n",
" <td>87</td>\n",
" <td>126</td>\n",
" <td>-0.00</td>\n",
" <td>-0.00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>links-tech</td>\n",
" <td>355</td>\n",
" <td>121</td>\n",
" <td>234</td>\n",
" <td>1</td>\n",
" <td>72</td>\n",
" <td>827</td>\n",
" <td>11</td>\n",
" <td>108.42</td>\n",
" <td>-242</td>\n",
" <td>234</td>\n",
" <td>144</td>\n",
" <td>-7.01</td>\n",
" <td>-20.58</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>corp-tech</td>\n",
" <td>72</td>\n",
" <td>10</td>\n",
" <td>62</td>\n",
" <td>0</td>\n",
" <td>30</td>\n",
" <td>249</td>\n",
" <td>1</td>\n",
" <td>99.94</td>\n",
" <td>-20</td>\n",
" <td>62</td>\n",
" <td>60</td>\n",
" <td>-0.00</td>\n",
" <td>-2.06</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>tech</td>\n",
" <td>36</td>\n",
" <td>10</td>\n",
" <td>26</td>\n",
" <td>0</td>\n",
" <td>35</td>\n",
" <td>141</td>\n",
" <td>0</td>\n",
" <td>76.00</td>\n",
" <td>-20</td>\n",
" <td>26</td>\n",
" <td>70</td>\n",
" <td>-0.00</td>\n",
" <td>-0.00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>podcast</td>\n",
" <td>61</td>\n",
" <td>0</td>\n",
" <td>61</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>154</td>\n",
" <td>0</td>\n",
" <td>61.00</td>\n",
" <td>0</td>\n",
" <td>61</td>\n",
" <td>0</td>\n",
" <td>-0.00</td>\n",
" <td>-0.00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>podcast</td>\n",
" <td>51</td>\n",
" <td>3</td>\n",
" <td>48</td>\n",
" <td>0</td>\n",
" <td>4</td>\n",
" <td>622</td>\n",
" <td>0</td>\n",
" <td>50.00</td>\n",
" <td>-6</td>\n",
" <td>48</td>\n",
" <td>8</td>\n",
" <td>-0.00</td>\n",
" <td>-0.00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>podcast-python</td>\n",
" <td>49</td>\n",
" <td>2</td>\n",
" <td>47</td>\n",
" <td>0</td>\n",
" <td>4</td>\n",
" <td>440</td>\n",
" <td>1</td>\n",
" <td>48.94</td>\n",
" <td>-4</td>\n",
" <td>47</td>\n",
" <td>8</td>\n",
" <td>-0.00</td>\n",
" <td>-2.06</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" tags total read unread important unimportant total_all \\\n",
"0 245 76 169 0 195 636 \n",
"1 webcomic 155 43 112 0 149 1286 \n",
"2 money-podcast 205 36 169 0 91 465 \n",
"3 corp-tech 104 17 87 0 63 656 \n",
"4 links-tech 355 121 234 1 72 827 \n",
"5 corp-tech 72 10 62 0 30 249 \n",
"6 tech 36 10 26 0 35 141 \n",
"7 podcast 61 0 61 0 0 154 \n",
"8 podcast 51 3 48 0 4 622 \n",
"9 podcast-python 49 2 47 0 4 440 \n",
"\n",
" important_all score read_s unread_s unimportant_s important_s \\\n",
"0 0 407.00 -152 169 390 -0.00 \n",
"1 0 324.00 -86 112 298 -0.00 \n",
"2 0 279.00 -72 169 182 -0.00 \n",
"3 0 179.00 -34 87 126 -0.00 \n",
"4 11 108.42 -242 234 144 -7.01 \n",
"5 1 99.94 -20 62 60 -0.00 \n",
"6 0 76.00 -20 26 70 -0.00 \n",
"7 0 61.00 0 61 0 -0.00 \n",
"8 0 50.00 -6 48 8 -0.00 \n",
"9 1 48.94 -4 47 8 -0.00 \n",
"\n",
" important_old_s \n",
"0 -0.00 \n",
"1 -0.00 \n",
"2 -0.00 \n",
"3 -0.00 \n",
"4 -20.58 \n",
"5 -2.06 \n",
"6 -0.00 \n",
"7 -0.00 \n",
"8 -0.00 \n",
"9 -2.06 "
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head(10)"
]
},
{
"cell_type": "markdown",
"id": "4ed50165-3865-4633-aed3-93112440b837",
"metadata": {},
"source": [
"Highest value feeds look like this:\n",
"\n",
"* high read ratio, regardless of volume\n",
"* some with lots of important entries\n",
"\n",
"Special mentions:\n",
"\n",
"* all the `twitter` ones are defunct (TODO: exclude feeds that don't take updates and have low unread count entirely)\n",
" * `114 self-twitter` is kept for archival purposes\n",
" * `116 twitter` and `120 twitter` are kept because I want to keep the few important entries; fix: [lemon24/reader#230](https://github.com/lemon24/reader/issues/290)\n",
"* `119 tech` added lots of old entries, I marked them as read so they don't spam the \"recent\" view; fix: [lemon24/reader#305](https://github.com/lemon24/reader/issues/305)"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "d16a2f5e-27cd-4a17-bdbb-56d6f456d53d",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>tags</th>\n",
" <th>total</th>\n",
" <th>read</th>\n",
" <th>unread</th>\n",
" <th>important</th>\n",
" <th>unimportant</th>\n",
" <th>total_all</th>\n",
" <th>important_all</th>\n",
" <th>score</th>\n",
" <th>read_s</th>\n",
" <th>unread_s</th>\n",
" <th>unimportant_s</th>\n",
" <th>important_s</th>\n",
" <th>important_old_s</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>106</th>\n",
" <td>python</td>\n",
" <td>32</td>\n",
" <td>19</td>\n",
" <td>13</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>88</td>\n",
" <td>0</td>\n",
" <td>-25.00</td>\n",
" <td>-38</td>\n",
" <td>13</td>\n",
" <td>0</td>\n",
" <td>-0.00</td>\n",
" <td>-0.00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>107</th>\n",
" <td>tech</td>\n",
" <td>6</td>\n",
" <td>3</td>\n",
" <td>3</td>\n",
" <td>2</td>\n",
" <td>0</td>\n",
" <td>31</td>\n",
" <td>6</td>\n",
" <td>-25.24</td>\n",
" <td>-6</td>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" <td>-14.01</td>\n",
" <td>-8.23</td>\n",
" </tr>\n",
" <tr>\n",
" <th>108</th>\n",
" <td>python</td>\n",
" <td>11</td>\n",
" <td>4</td>\n",
" <td>7</td>\n",
" <td>2</td>\n",
" <td>0</td>\n",
" <td>48</td>\n",
" <td>7</td>\n",
" <td>-25.30</td>\n",
" <td>-8</td>\n",
" <td>7</td>\n",
" <td>0</td>\n",
" <td>-14.01</td>\n",
" <td>-10.29</td>\n",
" </tr>\n",
" <tr>\n",
" <th>109</th>\n",
" <td>podcast</td>\n",
" <td>40</td>\n",
" <td>31</td>\n",
" <td>9</td>\n",
" <td>0</td>\n",
" <td>8</td>\n",
" <td>120</td>\n",
" <td>0</td>\n",
" <td>-37.00</td>\n",
" <td>-62</td>\n",
" <td>9</td>\n",
" <td>16</td>\n",
" <td>-0.00</td>\n",
" <td>-0.00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>110</th>\n",
" <td>tech</td>\n",
" <td>4</td>\n",
" <td>4</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>401</td>\n",
" <td>15</td>\n",
" <td>-38.86</td>\n",
" <td>-8</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>-0.00</td>\n",
" <td>-30.86</td>\n",
" </tr>\n",
" <tr>\n",
" <th>111</th>\n",
" <td>webcomic</td>\n",
" <td>22</td>\n",
" <td>21</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>143</td>\n",
" <td>0</td>\n",
" <td>-41.00</td>\n",
" <td>-42</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>-0.00</td>\n",
" <td>-0.00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>112</th>\n",
" <td></td>\n",
" <td>34</td>\n",
" <td>27</td>\n",
" <td>7</td>\n",
" <td>0</td>\n",
" <td>2</td>\n",
" <td>179</td>\n",
" <td>0</td>\n",
" <td>-43.00</td>\n",
" <td>-54</td>\n",
" <td>7</td>\n",
" <td>4</td>\n",
" <td>-0.00</td>\n",
" <td>-0.00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>113</th>\n",
" <td>podcast</td>\n",
" <td>25</td>\n",
" <td>25</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>115</td>\n",
" <td>0</td>\n",
" <td>-50.00</td>\n",
" <td>-50</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>-0.00</td>\n",
" <td>-0.00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>114</th>\n",
" <td>self-twitter</td>\n",
" <td>52</td>\n",
" <td>51</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>57</td>\n",
" <td>0</td>\n",
" <td>-101.00</td>\n",
" <td>-102</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>-0.00</td>\n",
" <td>-0.00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>115</th>\n",
" <td>tech</td>\n",
" <td>144</td>\n",
" <td>93</td>\n",
" <td>51</td>\n",
" <td>6</td>\n",
" <td>1</td>\n",
" <td>182</td>\n",
" <td>6</td>\n",
" <td>-175.04</td>\n",
" <td>-186</td>\n",
" <td>51</td>\n",
" <td>2</td>\n",
" <td>-42.04</td>\n",
" <td>-0.00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>116</th>\n",
" <td>twitter</td>\n",
" <td>129</td>\n",
" <td>117</td>\n",
" <td>12</td>\n",
" <td>2</td>\n",
" <td>1</td>\n",
" <td>828</td>\n",
" <td>4</td>\n",
" <td>-238.13</td>\n",
" <td>-234</td>\n",
" <td>12</td>\n",
" <td>2</td>\n",
" <td>-14.01</td>\n",
" <td>-4.12</td>\n",
" </tr>\n",
" <tr>\n",
" <th>117</th>\n",
" <td>webcomic</td>\n",
" <td>158</td>\n",
" <td>154</td>\n",
" <td>4</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>918</td>\n",
" <td>1</td>\n",
" <td>-304.06</td>\n",
" <td>-308</td>\n",
" <td>4</td>\n",
" <td>2</td>\n",
" <td>-0.00</td>\n",
" <td>-2.06</td>\n",
" </tr>\n",
" <tr>\n",
" <th>118</th>\n",
" <td>self-tech</td>\n",
" <td>173</td>\n",
" <td>171</td>\n",
" <td>2</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>507</td>\n",
" <td>0</td>\n",
" <td>-340.00</td>\n",
" <td>-342</td>\n",
" <td>2</td>\n",
" <td>0</td>\n",
" <td>-0.00</td>\n",
" <td>-0.00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>119</th>\n",
" <td>tech</td>\n",
" <td>202</td>\n",
" <td>201</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>258</td>\n",
" <td>1</td>\n",
" <td>-408.01</td>\n",
" <td>-402</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>-7.01</td>\n",
" <td>-0.00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>120</th>\n",
" <td>twitter</td>\n",
" <td>313</td>\n",
" <td>283</td>\n",
" <td>30</td>\n",
" <td>0</td>\n",
" <td>6</td>\n",
" <td>926</td>\n",
" <td>1</td>\n",
" <td>-526.06</td>\n",
" <td>-566</td>\n",
" <td>30</td>\n",
" <td>12</td>\n",
" <td>-0.00</td>\n",
" <td>-2.06</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" tags total read unread important unimportant total_all \\\n",
"106 python 32 19 13 0 0 88 \n",
"107 tech 6 3 3 2 0 31 \n",
"108 python 11 4 7 2 0 48 \n",
"109 podcast 40 31 9 0 8 120 \n",
"110 tech 4 4 0 0 0 401 \n",
"111 webcomic 22 21 1 0 0 143 \n",
"112 34 27 7 0 2 179 \n",
"113 podcast 25 25 0 0 0 115 \n",
"114 self-twitter 52 51 1 0 0 57 \n",
"115 tech 144 93 51 6 1 182 \n",
"116 twitter 129 117 12 2 1 828 \n",
"117 webcomic 158 154 4 0 1 918 \n",
"118 self-tech 173 171 2 0 0 507 \n",
"119 tech 202 201 1 1 0 258 \n",
"120 twitter 313 283 30 0 6 926 \n",
"\n",
" important_all score read_s unread_s unimportant_s important_s \\\n",
"106 0 -25.00 -38 13 0 -0.00 \n",
"107 6 -25.24 -6 3 0 -14.01 \n",
"108 7 -25.30 -8 7 0 -14.01 \n",
"109 0 -37.00 -62 9 16 -0.00 \n",
"110 15 -38.86 -8 0 0 -0.00 \n",
"111 0 -41.00 -42 1 0 -0.00 \n",
"112 0 -43.00 -54 7 4 -0.00 \n",
"113 0 -50.00 -50 0 0 -0.00 \n",
"114 0 -101.00 -102 1 0 -0.00 \n",
"115 6 -175.04 -186 51 2 -42.04 \n",
"116 4 -238.13 -234 12 2 -14.01 \n",
"117 1 -304.06 -308 4 2 -0.00 \n",
"118 0 -340.00 -342 2 0 -0.00 \n",
"119 1 -408.01 -402 1 0 -7.01 \n",
"120 1 -526.06 -566 30 12 -0.00 \n",
"\n",
" important_old_s \n",
"106 -0.00 \n",
"107 -8.23 \n",
"108 -10.29 \n",
"109 -0.00 \n",
"110 -30.86 \n",
"111 -0.00 \n",
"112 -0.00 \n",
"113 -0.00 \n",
"114 -0.00 \n",
"115 -0.00 \n",
"116 -4.12 \n",
"117 -2.06 \n",
"118 -0.00 \n",
"119 -0.00 \n",
"120 -2.06 "
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.tail(15)"
]
},
{
"cell_type": "markdown",
"id": "870d20df-41db-4ffa-bc16-bb4d51053f67",
"metadata": {},
"source": [
"> Of course, this can be approximated quite well by sorting by unread / unimportant\n",
"\n",
"This seems mostly true (although for 3 months only 21 of 30 were the same for unread or unimportant)."
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "c98ff6e5-fba5-486c-b0f3-81848ffb68f0",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"common top 10 feeds compared with\n",
" unread: 7\n",
" unimportant: 7\n",
" unread or unimportant: 8\n",
"common top 20 feeds compared with\n",
" unread: 16\n",
" unimportant: 13\n",
" unread or unimportant: 18\n",
"common top 30 feeds compared with\n",
" unread: 23\n",
" unimportant: 18\n",
" unread or unimportant: 25\n"
]
}
],
"source": [
"by_unread = df.sort_values('unread', ascending=False)\n",
"by_unimportant = df.sort_values('unimportant', ascending=False)\n",
"\n",
"for n in 10, 20, 30:\n",
" print(\"common top\", n, \"feeds compared with\")\n",
" score = set(df.head(n).index)\n",
" unread = set(by_unread.head(n).index)\n",
" unimportant = set(by_unimportant.head(n).index)\n",
" print(\" unread:\", len(score & unread))\n",
" print(\" unimportant:\", len(score & unimportant))\n",
" print(\" unread or unimportant:\", len(score & (unread | unimportant)))\n"
]
},
{
"cell_type": "markdown",
"id": "7b8edb23-beaa-4b95-a770-7744a1162908",
"metadata": {},
"source": [
"The feeds not caught in top 30 of unread or unimportant are lower-volume ones that are mostly unread."
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "2655d000-8ca8-486f-bd9b-5aa3a991d557",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>tags</th>\n",
" <th>total</th>\n",
" <th>read</th>\n",
" <th>unread</th>\n",
" <th>important</th>\n",
" <th>unimportant</th>\n",
" <th>total_all</th>\n",
" <th>important_all</th>\n",
" <th>score</th>\n",
" <th>read_s</th>\n",
" <th>unread_s</th>\n",
" <th>unimportant_s</th>\n",
" <th>important_s</th>\n",
" <th>important_old_s</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>22</th>\n",
" <td>podcast-python</td>\n",
" <td>15</td>\n",
" <td>0</td>\n",
" <td>15</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>209</td>\n",
" <td>0</td>\n",
" <td>17.00</td>\n",
" <td>0</td>\n",
" <td>15</td>\n",
" <td>2</td>\n",
" <td>-0.00</td>\n",
" <td>-0.00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>23</th>\n",
" <td>podcast</td>\n",
" <td>14</td>\n",
" <td>0</td>\n",
" <td>14</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>146</td>\n",
" <td>0</td>\n",
" <td>16.00</td>\n",
" <td>0</td>\n",
" <td>14</td>\n",
" <td>2</td>\n",
" <td>-0.00</td>\n",
" <td>-0.00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>24</th>\n",
" <td>python</td>\n",
" <td>18</td>\n",
" <td>1</td>\n",
" <td>17</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>89</td>\n",
" <td>0</td>\n",
" <td>15.00</td>\n",
" <td>-2</td>\n",
" <td>17</td>\n",
" <td>0</td>\n",
" <td>-0.00</td>\n",
" <td>-0.00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>27</th>\n",
" <td>tech</td>\n",
" <td>13</td>\n",
" <td>1</td>\n",
" <td>12</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>70</td>\n",
" <td>0</td>\n",
" <td>12.00</td>\n",
" <td>-2</td>\n",
" <td>12</td>\n",
" <td>2</td>\n",
" <td>-0.00</td>\n",
" <td>-0.00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>28</th>\n",
" <td>podcast</td>\n",
" <td>8</td>\n",
" <td>0</td>\n",
" <td>8</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>22</td>\n",
" <td>0</td>\n",
" <td>10.00</td>\n",
" <td>0</td>\n",
" <td>8</td>\n",
" <td>2</td>\n",
" <td>-0.00</td>\n",
" <td>-0.00</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" tags total read unread important unimportant total_all \\\n",
"22 podcast-python 15 0 15 0 1 209 \n",
"23 podcast 14 0 14 0 1 146 \n",
"24 python 18 1 17 0 0 89 \n",
"27 tech 13 1 12 0 1 70 \n",
"28 podcast 8 0 8 0 1 22 \n",
"\n",
" important_all score read_s unread_s unimportant_s important_s \\\n",
"22 0 17.00 0 15 2 -0.00 \n",
"23 0 16.00 0 14 2 -0.00 \n",
"24 0 15.00 -2 17 0 -0.00 \n",
"27 0 12.00 -2 12 2 -0.00 \n",
"28 0 10.00 0 8 2 -0.00 \n",
"\n",
" important_old_s \n",
"22 -0.00 \n",
"23 -0.00 \n",
"24 -0.00 \n",
"27 -0.00 \n",
"28 -0.00 "
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.iloc[list(score - (unread | unimportant))]"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.1"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment