May 22, 2023 at 12:34:10 PM
system:
You are ChatGPT, a large language model trained by OpenAI, based on the GPT-4 architecture. Knowledge cutoff: 2021-09 Current date: 2023-05-22
Prompt:
<table>
<thead>
<tr>
<th align="left">ISO 693-3</th>
<th align="left">Language Name</th>
<th align="left">Language Family</th>
<th align="left">Subgrouping</th>
<th align="left">ASR</th>
<th align="left">TTS</th>
<th align="left">LID</th>
<th align="left"></th>
</tr>
</thead>
<tbody><tr>
<td align="left">aaa</td>
<td align="left">Ghotuo</td>
<td align="left">Niger-Congo</td>
<td align="left">Atlantic-Congo</td>
<td align="left"></td>
<td align="left"></td>
<td align="left">✔️</td>
<td align="left"><a href="https://www.ethnologue.com/language/aaa">Ethnologue</a></td>
</tr>
<tr>
<td align="left">aab</td>
<td align="left">Arum</td>
<td align="left">Niger-Congo</td>
<td align="left">Atlantic-Congo</td>
<td align="left"></td>
<td align="left"></td>
<td align="left">✔️</td>
<td align="left"><a href="https://www.ethnologue.com/language/aab">Ethnologue</a></td>
</tr>
Write JavaScript to turn this table into newline-delimited JSON objects
ChatGPT:
Here's a JavaScript function which can parse HTML tables into newline-delimited JSON objects. This script uses the browser's built-in DOMParser to parse the HTML, and then it walks the DOM to build up the JSON objects:
function tableToJson(table) {
var parser = new DOMParser();
var doc = parser.parseFromString(table, "text/html");
var rows = doc.querySelectorAll("tr");
var result = [];
// Use the header row to get the keys
var keys = Array.from(rows[0].querySelectorAll("th")).map(th => th.textContent.trim());
// Iterate over the rest of the rows
for (var i = 1; i < rows.length; i++) {
var obj = {};
var cols = rows[i].querySelectorAll("td");
// Map each cell in the row to a key from the header row
for (var j = 0; j < cols.length; j++) {
if (cols[j].textContent.trim() === '✔️') {
obj[keys[j]] = true;
} else if (cols[j].querySelector('a')) {
obj[keys[j]] = cols[j].querySelector('a').href;
} else {
obj[keys[j]] = cols[j].textContent.trim() || null;
}
}
result.push(obj);
}
// Convert the result array to newline-delimited JSON
return result.map(JSON.stringify).join('\n');
}
You can use this function like this:
var table = `...`; // Paste your table here
console.log(tableToJson(table));
Replace '...'
with your table's HTML. This script will output a string where each line is a separate JSON object. Each JSON object corresponds to a row in the table. Note that this script assumes that the first row of the table contains the headers.
Prompt:
ASR and TTS and LID should always be boolean false or true
ChatGPT:
Okay, let's adjust the JavaScript function to ensure that ASR, TTS, and LID fields are always either true
or false
. The presence of the ✔️ character will be translated to true
and its absence will be translated to false
.
function tableToJson(table) {
var parser = new DOMParser();
var doc = parser.parseFromString(table, "text/html");
var rows = doc.querySelectorAll("tr");
var result = [];
// Use the header row to get the keys
var keys = Array.from(rows[0].querySelectorAll("th")).map(th => th.textContent.trim());
// Iterate over the rest of the rows
for (var i = 1; i < rows.length; i++) {
var obj = {};
var cols = rows[i].querySelectorAll("td");
// Map each cell in the row to a key from the header row
for (var j = 0; j < cols.length; j++) {
switch(keys[j]) {
case 'ASR':
case 'TTS':
case 'LID':
obj[keys[j]] = cols[j].textContent.trim() === '✔️';
break;
case '':
obj[keys[j]] = cols[j].querySelector('a').href;
break;
default:
obj[keys[j]] = cols[j].textContent.trim() || null;
break;
}
}
result.push(obj);
}
// Convert the result array to newline-delimited JSON
return result.map(JSON.stringify).join('\n');
}
Now, the columns labeled 'ASR', 'TTS', and 'LID' are guaranteed to be boolean values. If the cell contains a ✔️, the corresponding field in the JSON object will be true
, otherwise false
.
Prompt:
Make the input to that function a DOM table object
ChatGPT:
You're right, the input would be more appropriately a DOM table object, not an HTML string. Here's a revised version of the function:
function tableToJson(tableElement) {
var rows = tableElement.querySelectorAll("tr");
var result = [];
// Use the header row to get the keys
var keys = Array.from(rows[0].querySelectorAll("th")).map(th => th.textContent.trim());
// Iterate over the rest of the rows
for (var i = 1; i < rows.length; i++) {
var obj = {};
var cols = rows[i].querySelectorAll("td");
// Map each cell in the row to a key from the header row
for (var j = 0; j < cols.length; j++) {
switch(keys[j]) {
case 'ASR':
case 'TTS':
case 'LID':
obj[keys[j]] = cols[j].textContent.trim() === '✔️';
break;
case '':
obj[keys[j]] = cols[j].querySelector('a').href;
break;
default:
obj[keys[j]] = cols[j].textContent.trim() || null;
break;
}
}
result.push(obj);
}
// Convert the result array to newline-delimited JSON
return result.map(JSON.stringify).join('\n');
}
You can use this function like this:
var tableElement = document.querySelector("table"); // Select the table in your DOM
console.log(tableToJson(tableElement));
This script will output a string where each line is a separate JSON object. Each JSON object corresponds to a row in the table. Note that this script assumes that the first row of the table contains the headers.
Prompt:
Uncaught TypeError: cols[j].querySelector(...) is null
ChatGPT:
I apologize for the error, it seems that there is an issue when the table cell does not contain a link. Let's revise the code to handle this case:
function tableToJson(tableElement) {
var rows = tableElement.querySelectorAll("tr");
var result = [];
// Use the header row to get the keys
var keys = Array.from(rows[0].querySelectorAll("th")).map(th => th.textContent.trim());
// Iterate over the rest of the rows
for (var i = 1; i < rows.length; i++) {
var obj = {};
var cols = rows[i].querySelectorAll("td");
// Map each cell in the row to a key from the header row
for (var j = 0; j < cols.length; j++) {
switch(keys[j]) {
case 'ASR':
case 'TTS':
case 'LID':
obj[keys[j]] = cols[j].textContent.trim() === '✔️';
break;
case '':
var link = cols[j].querySelector('a');
obj[keys[j]] = link ? link.href : null;
break;
default:
obj[keys[j]] = cols[j].textContent.trim() || null;
break;
}
}
result.push(obj);
}
// Convert the result array to newline-delimited JSON
return result.map(JSON.stringify).join('\n');
}
With this adjustment, the function now checks whether a link exists in the last column before attempting to access its href
attribute. If no link exists, it assigns null
to the corresponding field in the JSON object.
Here's what I used this for: https://gist.github.com/simonw/63aa33ec827b093f9c6a2797df950aba?permalink_comment_id=4576083#gistcomment-4576083
Also: https://simonwillison.net/2023/May/22/mms-language-coverage-in-datasette-lite/