Skip to content

Instantly share code, notes, and snippets.

@HelloWorld017
Created September 26, 2017 14:07
Show Gist options
  • Save HelloWorld017/d074b8a420e4607bb5f31637be6406d9 to your computer and use it in GitHub Desktop.
Save HelloWorld017/d074b8a420e4607bb5f31637be6406d9 to your computer and use it in GitHub Desktop.
Crawl hanja from naver dictionary and print beautifully
const axios = require('axios').create({
headers: {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36'
},
responseType: 'text',
baseURL: 'http://hanja.naver.com'
});
const cheerio = require('cheerio');
const fs = require('fs');
const finder = (query) => axios.get('/search', {
params: {query}
});
const parser = async (html) => {
const $ = cheerio.load(html);
if(html.includes('<!-- 검색결과 -->')) {
const href = $('.result_chn_words dl dd a[href*="word"]').attr('href');
const req = await axios.get(href);
return await parser(req.data);
}
const lis = $('dd.t_letter li').toArray();
const hanjaMap = lis.map((v) => $(v).children('.hanja').text().trim());
const textMap = lis.map((v) => $(v).children().remove().end().text().trim()).map((v) => {
return v.replace(/^([가-힣]+) ([가-힣](?:\([가-힣]\))?).*$/, (m, p1, p2) => `${p1} <b>${p2}</b>`);
});
return {
text: textMap,
hanja: hanjaMap
};
};
const handleHanja = async (query) => {
const html = await finder(query);
return await parser(html.data);
};
const sleep = (sec) => new Promise((resolve) => setTimeout(resolve, sec));
let hanjaList = `각주구검
견토지쟁
구우일모
귤화위지
계륵
과유불급
괄목상대
관포지교
교언영색
구밀복검
간담상조
금의야행
권토중래
기우
낭중지추
다기망양
단장
당랑거철
대기만성
동병상련
등용문
강노지말
마부작침
망양지탄
맥수지탄
맹모단기
맹모삼천
모순
미생지신
백년하청
백미
백아절현
사족
삼고초려
삼인성호
새옹지마
순망치한
양두구육
어부지리
온고지신
와신상담
우공이산
금란지교
전전긍긍
전화위복`.split('\n').map((v) => v.trim()).filter((v) => v !== '');
let html = `
<style>
.collection {
column-count: 3;
}
.hanja {
font-family: SpoqaHanSansJP, NanumBarunGothic;
display: flex;
padding: 10px;
height: 100px;
break-inside: avoid-column;
-webkit-column-break-inside: avoid;
align-items: center;
}
b {
color: #ff6d00;
}
.word {
font-size: 1.3rem;
display: flex;
align-items: center;
width: 100px;
}
.hanja:nth-child(2n) {
background: #f1f2f3;
}
.hanja:nth-child(2n + 1) {
background: #fafbfc;
}
</style>
<div class="collection">
`;
(async () => {
for(let hanjaName of hanjaList) {
const {text, hanja} = await handleHanja(hanjaName);
html += `
<div class="hanja">
<div class="word">
${hanja.join('')}
</div>
<div class="explanation">
${text.join('<br>')}
</div>
</div>
`;
await sleep(100);
}
html += '</div>';
fs.writeFileSync('./hanja.html', html);
})();
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment