Skip to content

Instantly share code, notes, and snippets.

@abikoushi
Last active November 10, 2025 01:42
Show Gist options
  • Select an option

  • Save abikoushi/bb8f347399a7f6ffe08a609aeba7b606 to your computer and use it in GitHub Desktop.

Select an option

Save abikoushi/bb8f347399a7f6ffe08a609aeba7b606 to your computer and use it in GitHub Desktop.
Simple frequency plot of the bigram
library(dplyr)
library(ggplot2)
#see following
#browseURL("https://rekihaku.pref.hyogo.lg.jp/curator/20165/")
wazauta = scan("wazauta.txt", what = character())
n = nchar(wazauta)
unigram = character(n)
for( i in 1:n ){
unigram[i] = substr(wazauta, i, i)
}
nline = 20L
df_bi = data.frame(unigram=unigram) %>%
mutate(pos=row_number(), nextchar=lead(unigram)) %>%
group_by(unigram, nextchar) %>%
mutate(freq=n()) %>%
ungroup() %>%
mutate(x = pos %/% nline, y = pos %% nline)
head(df_bi)
p = ggplot(df_bi, aes(x=x, y=y))+
geom_text(aes(label = unigram, colour=freq), family="Osaka")+
scale_color_continuous()+
scale_y_reverse()+scale_x_reverse()+
theme_minimal(20)
print(p)
ggsave(plot=p, filename = "wazauta.png", width = 9, height = 9)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment