Skip to content

Instantly share code, notes, and snippets.

@marethyu
Last active May 3, 2025 10:02
Show Gist options
  • Save marethyu/e4fa293c3cd7a8c45b3211712ab3d2d3 to your computer and use it in GitHub Desktop.
Save marethyu/e4fa293c3cd7a8c45b3211712ab3d2d3 to your computer and use it in GitHub Desktop.
Bunpro to Anki

Bunpro Anki Decks (v1)

For Bunpro users who wants to review grammar using Anki's SRS instead of Bunpro's built-in SRS.

Deck Download Links

  • N1 (Updated: 2021-06-21)
  • N2 (Updated: 2021-06-21)
  • N3 (Updated: 2021-06-21)
  • N4 (Updated: 2021-06-21)
  • N5 (Updated: 2021-06-21)
  • All in One (Updated: 2021-06-21)

JSON file containing data used for creating the decks above can be found here (sentence audio).

UPDATE: Unfortunately, the above download links are now dead and I have no way to recover the original files. I plan to create new grammar decks in future. But in the meantime, you can try the fixed all-in-one deck generously shared to me by Keith Ng.

Some Screenshots

Card front

front1

Card back

back1

Card back without Japanese explanation

back2

Card back with Japanese explanation

back3

How to Study with Bunpro Decks

TODO

Bunpro Cloze Fields Explained

This section is reserved for anyone who wants to edit fields or create cards.

  • Sentence - Example sentence for demonstrating a specific grammar point. Usually equiped with cloze deletion(s) on specific part(s) of the sentence. Written in Japanese with no furigana. I recommend selecting sentences that contains no more than one unknown word.
  • SentenceTranslation - English translation of the example sentence.
  • SentenceAudio - Japanese audio recording of the example sentence.
  • SentenceNuance - Nuances of the example sentence. It can explain things like an alternative way to understand the sentence, other unknown grammar, etc.
  • Grammar - Specific grammar point.
  • GrammarMeaning - English meaning of the grammar point.
  • GrammarMeaningJP - Japanese meaning/explanation of the grammar point. (It is not provided at default. You can find good Japanese explanations in 1, 2, 3, and more.)
  • GrammarStructure - Explains how to use the grammar point with nouns, verbs, adjectives, etc.
  • GrammarNuance - (Self explanatory)
  • SupplementalLinks - List of online resources (like websites) to learn more about the grammar point.
  • OfflineResources - List of offline resources (like textbooks) to learn more about the grammar point.

How to Use the Scripts

Make sure you have Python 3.9.1 or above installed.

The following Python packages are used:

  • beautifulsoup4 4.9.3
  • genanki 0.11.0
  • requests 2.25.1

Instructions

  1. In the directory where you downloaded the Python scripts, create the following folders: bunpro-sentence-audio, decks, and json.

  2. Create a new Bunpro account and log in. You should have a free trial enabled. Now, create the cookies.json file located in the json folder with the following contents below. Use your favourite browser's dev tools to retrieve cookies to fill the ... sections of the file.

{
  "_ga": "...",
  "_grammar_app_session": ...",
  "ahoy_visitor": "...",
  "__stripe_sid": "...",
  "ahoy_visit": "...",
  "__stripe_mid": "...",
  "1P_JAR": "..."
}
  1. Run bunpro.py. It will take a while. You should see some info printed in the console window.

  2. Open dl_audio.py and edit the F_NAME variable with the recent XXX-bunpro.jp.all.grammar.json file. Then, edit OUT_DIR with the full path (or relative) to the bunpro-sentence-audio folder. Now run the script. It will also take a while.

  3. Finally, open anki.py and edit the JSON_F_NAME with the recent XXX-bunpro.jp.all.grammar.json file. Run the script. You should get Anki decks located in the decks directory.

Suggestions and Bug Reports

If you find any bugs in Anki decks, can't get the scripts working, or have any suggestions or changes for me, please don't hesitate to contact me! You can comment below or shoot me an email.


Modified: 2024/08/31

import html
import json
import os
import random
import sys
from genanki import Model, Note, Deck, Package
JSON_F_NAME = 'json/2021-06-16-17-37-48-bunpro.jp.all.grammar.json'
OUT_DIR = 'decks'
MODEL_ID = 1231960757
FRONT_HTML = """
<script>
var colors = ['#FAEBD7', '#7FFFD4', '#FFD700', '#90EE90', '#00FA9A', '#40E0D0', '#87CEEB', '#98FB98', '#AFEEEE'];
var random_color = colors[Math.floor(Math.random() * colors.length)];
document.getElementById('front').parentNode.parentNode.style.backgroundColor = random_color;
</script>
<div id="front">
{{cloze:Sentence}}
{{#SentenceTranslation}}
<details>
<summary>Hint</summary>
{{SentenceTranslation}}
</details>
{{/SentenceTranslation}}
</div>
"""
BACK_HTML = """
<div class="center-text">
{{cloze:Sentence}} <br>
{{#SentenceAudio}}
{{SentenceAudio}} <br>
{{/SentenceAudio}}
{{#SentenceTranslation}}
<details>
<summary>Translation</summary>
{{SentenceTranslation}}
</details>
{{/SentenceTranslation}}
{{#SentenceNuance}}
{{SentenceNuance}} <br>
{{/SentenceNuance}}
</div>
<hr>
Grammar: {{Grammar}} <br>
{{^GramMeaningJP}}
Meaning: <br>
<div class="center-text">{{GramMeaning}}<br></div>
{{/GramMeaningJP}}
{{#GramMeaningJP}}
意味: <br>
<div class="center-text">{{GramMeaningJP}}<br></div>
<details>
<summary>英語</summary>
<div class="center-text">{{GramMeaning}}<br></div>
</details>
{{/GramMeaningJP}}
Structure: <br>
<div class="center-text">{{GrammarStructure}}<br></div>
{{#GrammarNuance}}
Nuance: <br>
<div class="center-text">{{GrammarNuance}}<br></div>
{{/GrammarNuance}}
<details>
<summary>Supplemental Links</summary>
{{SupplementalLinks}}
</details>
{{#OfflineResources}}
<details>
<summary>Offline Resources</summary>
{{OfflineResources}}
</details>
{{/OfflineResources}}
"""
CSS = """
.card {
font-family: arial;
font-size: 20px;
}
#front {
text-align: center;
}
.center-text {
text-align: center;
}
.cloze {
font-weight: bold;
color: red;
}
"""
with open(JSON_F_NAME, 'r', encoding='utf-8') as f:
data = json.load(f)
BunproCloze = Model(
model_id=MODEL_ID,
name='Bunpro Cloze',
fields=[
{'name': 'Sentence'},
{'name': 'SentenceTranslation'},
{'name': 'SentenceAudio'},
{'name': 'SentenceNuance'},
{'name': 'Grammar'},
{'name': 'GramMeaning'},
{'name': 'GramMeaningJP'},
{'name': 'GrammarStructure'},
{'name': 'GrammarNuance'},
{'name': 'SupplementalLinks'},
{'name': 'OfflineResources'}
],
templates=[
{
'name': 'Cloze',
'qfmt': FRONT_HTML,
'afmt': BACK_HTML
}
],
css=CSS,
model_type=Model.CLOZE
)
def publish_deck(pkg_name, deck_name, jlpt_level=None):
deck_id = random.randrange(1 << 30, 1 << 31)
deck = Deck(deck_id=deck_id, name=deck_name, description='<div style="text-align:center;">Grammar deck adapted from <a href="https://bunpro.jp">bunpro.jp</a></div>')
package = Package(deck)
for i in range(len(data)):
if jlpt_level == None or data[i]['jlpt_level'] == jlpt_level:
gram = html.escape(data[i]['jp_meaning'])
meaning = '<br>'.join([html.escape(x) for x in data[i]['eng_meanings']])
structure = '<br>'.join([html.escape(x) for x in data[i]['structure']])
gram_nuance = '' if 'nuances' not in data[i] else html.escape(data[i]['nuances'])
suppl_links = '<ul>' + ''.join([f'<li>{html.escape(x["description"])}: <a href="{x["link"]["url"]}">{html.escape(x["link"]["name"])}</a></li>' for x in data[i]['more_info']['supplemental-links']]) + '</ul>'
offline_rsc = '' if not data[i]['more_info']['offline-resources'] else '<ul>' + ''.join(['<li>' + html.escape(x) + '</li>' for x in data[i]['more_info']['offline-resources']]) + '</ul>'
for sentence in data[i]['example_sentences']:
bun = sentence['sentence']
translation = html.escape(sentence['translation'])
sent_nuance = '' if 'nuance' not in sentence else html.escape(sentence['nuance'])
audio = ''
if 'audio_file' in sentence:
fname = sentence['audio_file']
audio_file = os.path.basename(fname)
if not os.path.isfile(fname):
print(f'{audio_file} does not exist...bye!')
sys.exit(1)
audio = f'[sound:{audio_file}]'
package.media_files.append(fname)
for cloze in sentence['cloze']:
bun = bun.replace(cloze, '{{c1::' + cloze + '}}')
note = Note(model=BunproCloze, fields=[bun, translation, audio, sent_nuance, gram, meaning, '', structure, gram_nuance, suppl_links, offline_rsc], tags=['Bunpro', 'Bunpro-' + data[i]['jlpt_level']])
deck.add_note(note)
print(f'Added note ({i})')
pkg_name = pkg_name + '.apkg'
abs_pkg_name = os.path.join(OUT_DIR, pkg_name)
package.write_to_file(abs_pkg_name)
print(f'{pkg_name} created!')
publish_deck('Bunpro_N5', 'Bunpro N5', 'N5')
publish_deck('Bunpro_N4', 'Bunpro N4', 'N4')
publish_deck('Bunpro_N3', 'Bunpro N3', 'N3')
publish_deck('Bunpro_N2', 'Bunpro N2', 'N2')
publish_deck('Bunpro_N1', 'Bunpro N1', 'N1')
publish_deck('Bunpro_All', 'Bunpro N5-N1 (all)')
import json
import os.path
import re
import requests
import sys
from bs4 import BeautifulSoup
from datetime import datetime
F_NAME = 'json/bunpro.jp.all.grammar.json'
BASE_URL = 'https://bunpro.jp/grammar_points/'
END = 866 # (2021-06-07)
with open('json/cookies.json', 'r', encoding='utf-8') as f:
cookies = json.load(f)
def jlpt_level(txt):
if 'N1' in txt:
return 'N1'
elif 'N2' in txt:
return 'N2'
elif 'N3' in txt:
return 'N3'
elif 'N4' in txt:
return 'N4'
return 'N5'
def find_clozes(sentence_html):
clozes = []
for cloze in sentence_html.find_all('strong'):
clozes.append(re.sub(r'\(.*?\)', '', cloze.text))
return clozes
def extract_info(g_num, content):
dct = {}
soup = BeautifulSoup(content, 'html.parser')
dct['g_num'] = g_num
dct['jlpt_level'] = jlpt_level(soup.select_one('.header__lesson-progress').text)
gpoint_meaning = soup.select_one('.grammar-point__meaning')
dct['jp_meaning'] = gpoint_meaning.find('h2', class_='meaning__japanese').text
dct['eng_meanings'] = []
eng_meanings = gpoint_meaning.find_all('div')
for meaning in eng_meanings:
dct['eng_meanings'].append(meaning.text)
dct['structure'] = []
divs = soup.select_one('div.col-xs-12.col-sm-8 > div').find_all('div')
for div in divs:
dct['structure'].append(div.text)
nuances = soup.select_one('.grammar-point__nuance')
if nuances is not None:
dct['nuances'] = nuances.text
dct['example_sentences'] = []
gpoint_examples = soup.select_one('.grammar-point__example-sentences')
ex_sentences = gpoint_examples.find_all('div', {'class': 'example-sentence__holder'})
for sentence in ex_sentences:
blob = {}
blob['sentence'] = sentence.find('div', {'class': 'btn copy-japanese--btn'})['data-clipboard-text']
blob['translation'] = sentence.select_one('.example-sentence-english').text
nuance = sentence.select_one('.example-sentence-nuance')
if nuance is not None:
blob['nuance'] = nuance.text
blob['cloze'] = find_clozes(sentence.find(class_='example-sentence japanese-example-sentence'))
audio_holder = sentence.select_one('div.audio-holder.lazy-load-audio')
if audio_holder is not None:
blob['audio_link'] = audio_holder['data-audio-src']
dct['example_sentences'].append(blob)
dct['more_info'] = {'supplemental-links': [], 'offline-resources': []}
more = soup.select_one('body > div.container.container--main.mobile-no-padding > div.section > section > div > div.grammar-point__info.more')
suppl_lnks = more.select_one('.supplemental-links')
descs = suppl_lnks.find_all('div', class_='supplemental-link__description')
for desc in descs:
n = re.search(r'\d+', desc['id']).group()
link = suppl_lnks.find('div', id=f'supplemental_link_{n}_link')
link_name = link.text
link_url = link.find('a', {'class': 'supplemental-link__link'})['href']
dct['more_info']['supplemental-links'].append({'description': desc.text, 'link': {'name': link_name, 'url': link_url}})
slct = more.select_one('.offline-resources')
if slct is not None:
offl_rscs = slct.find_all('div')
for rsc in offl_rscs:
dct['more_info']['offline-resources'].append(rsc.text)
return dct
data = []
start = 1
if not os.path.isfile(F_NAME):
with open(F_NAME, 'w', encoding='utf-8') as f:
json.dump([], f, ensure_ascii=False)
else:
with open(F_NAME, 'r', encoding='utf-8') as f:
data = json.load(f)
start = 1 if len(data) == 0 else data[len(data) - 1]['g_num'] + 1
def flush_to_file():
with open(F_NAME, 'w', encoding='utf-8') as f:
json.dump(data, f, ensure_ascii=False)
for g_num in range(start, END + 1):
if g_num == 699: # Not done yet (2021-06-07)
print('Skipping 699...')
continue
d = None
tries = 1
max_tries = 5
while tries <= max_tries and d is None:
try:
url = BASE_URL + str(g_num)
print(f'Retrieving data from {url}')
r = requests.get(url, stream=True, cookies=cookies)
if r.status_code == 404:
print(f'Grammar point {g_num} is probably nonexistant, skipping to the next one...')
tries = max_tries + 1
continue
if r.status_code != 200:
raise Exception(f'Bad status code ({r.status_code})')
d = extract_info(g_num, r.content)
except Exception as e:
print(e)
if tries == max_tries:
print(f'Failed to retrieve g_num {g_num} after {max_tries} attempts, terminating')
sys.exit(1)
tries += 1
print(f'Retrying (Attempt {tries})')
if tries == 6:
continue
data.append(d)
if g_num % 7 == 0:
print('Flushing to the file...')
flush_to_file()
flush_to_file()
now = datetime.now()
os.rename(F_NAME, f'json/{now.strftime("%Y-%m-%d-%H-%M-%S-")}bunpro.jp.all.grammar.json')
import json
import os
import requests
F_NAME = 'json/2021-06-16-17-37-48-bunpro.jp.all.grammar.json'
OUT_DIR = 'C:\\Users\\Jimmy\\Downloads\\bunpro\\bunpro-sentence-audio'
with open(F_NAME, 'r', encoding='utf-8') as f:
data = json.load(f)
cnt = 1
with requests.Session() as req:
for i in range(len(data)):
for sentence in data[i]['example_sentences']:
if 'audio_link' not in sentence:
continue
done = False
tries = 1
max_tries = 5
while tries <= max_tries and not done:
try:
url = sentence['audio_link']
bfname = str(cnt) + '.mp3'
fname = os.path.join(OUT_DIR, bfname)
sentence['audio_file'] = fname
if os.path.isfile(fname):
print(f'{bfname} exists! Skipping to the next one ({cnt})')
cnt += 1
done = True
continue
print(f'Downloading {url}')
r = req.get(url, stream=True)
if r.status_code == 200:
with open(fname, 'wb') as f:
f.write(r.content)
print(f'Completed downloading {bfname}')
cnt += 1
done = True
else:
raise Exception(f'Bad status code ({r.status_code})')
except Exception as e:
print(e)
if tries == max_tries:
print(f'Failed to retrieve g_num {g_num} after {max_tries} attempts, terminating')
sys.exit(1)
tries += 1
print(f'Retrying (Attempt {tries})')
with open(F_NAME, 'w', encoding='utf-8') as f:
json.dump(data, f, ensure_ascii=False)
@Paulo-Nunes
Copy link

Paulo-Nunes commented Jul 23, 2021

Full disclosure, I have not used the decks you provide yet, but from what I see it might be what I am looking for. I am studying Tobira and there is no official Grammar deck so Bunpro was an option, but I use Anki and don't want to have to use another program.
From my preliminary once over of a few cards I have these comments:

  • OfflineResources is nice, I will use it for sure, but Anki Tags would also be appreciated. I am currently adding Lesson tags to the notes for Tobira
  • The colors. I will probably be changing those background colors. Its my preference, but they are too bright and I would rather classic black/white
  • There are bound to be errors, so far I found that the grammar point Verb[stem] + はじめる lists 'Tobira : Page 18' as the OfflineResource, but its actually page 19 in the book.

For anyone that comes across this and wants tags too this is how I am doing it, there are probably better ways to do it:
Note: I am using Anki 2.1.43 and have some plugins that might affect this, notably "Advanced Browser". I also added all of your N1-N5 decks as subdecks to a "Bunpro" parent deck for easier maintaining.

  1. First I flip to the page in the book where the grammar begins.
  2. Then I search using "deck:Bunpro" AND "Tobira \: Page 17<" (the final < matches the HTML </li> tag and makes it so I don't match "page 179", for example)
  3. Then its as simple as selecting all and adding tags to the selected cards. I use nested tags and am piggybacking off of another decks tags and use Tobira::01::Grammar as my tag.
  4. Repeat for all pages. You could easily change the search to match all the pages from the chapter in one command, but I think you get the point

Keep up the good work.

Edit: I have found over 200 cards that have bad cloze set up. An example is: ミカサさんも来る{{c1::{{c1::だろう}}}}?
The simplest way to find these I found to be by searching for }}} in Anki's Browser.

@UnconsciousPebble
Copy link

The Mega links to the individual decks and the JSON files are down (full download still working).

@WhoMI7
Copy link

WhoMI7 commented Sep 4, 2023

Furigana support pls?

@danpaldev
Copy link

This no longer works, such a shame. Bunpro's SSR is absolute trash

@SumZbrod
Copy link

how to get cookies.json?

@Fun-Ken
Copy link

Fun-Ken commented Feb 19, 2024

Hi,

I'm trying to retrieve the correct cookies but I can't find some of them:

  • _ga
  • ahoy_visitor
  • ahoy_visit
  • 1P_JAR

Since this script is quite old, it might be possible Bunpro changed something on their end. Also, it might be possible that I'm just stupid 😀

Thanks for your awesome decks, still using them!

@marethyu
Copy link
Author

I am writing this comment for reference. It seems like Bunpro now offers an easy way to mine grammar points so that I don't have to scrape webpages anymore. JSON data for grammar point N can be accessed at https://bunpro.jp/_next/data/HATb4gJIYJjmxWTIrNTvA/en/grammar_points/N.json.

@Fun-Ken
Copy link

Fun-Ken commented Sep 3, 2024

Thank you for the update! Keith Ng's deck works like a charm, I had no problem updating my old deck (All in one), just import it and it works! By the way, not all links are dead, N5 and All in One are still working.

@TaakoMagnusen
Copy link

The All-In-One is a great collection of resources but it's unusable due to the ordering. As an example, in the N5 deck the first 20 are just だ and です cards. This seems like way too many for the same grammar point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment