Skip to content

Instantly share code, notes, and snippets.

@simonw
Created May 22, 2023 20:16
Show Gist options
  • Save simonw/ac671a9bdbda9f29dc5256511497bc1d to your computer and use it in GitHub Desktop.
Save simonw/ac671a9bdbda9f29dc5256511497bc1d to your computer and use it in GitHub Desktop.

HTML Table to JSON

May 22, 2023 at 12:34:10 PM

system:

You are ChatGPT, a large language model trained by OpenAI, based on the GPT-4 architecture. Knowledge cutoff: 2021-09 Current date: 2023-05-22


Prompt:

<table>
<thead>
<tr>
<th align="left">ISO 693-3</th>
<th align="left">Language Name</th>
<th align="left">Language Family</th>
<th align="left">Subgrouping</th>
<th align="left">ASR</th>
<th align="left">TTS</th>
<th align="left">LID</th>
<th align="left"></th>
</tr>
</thead>
<tbody><tr>
<td align="left">aaa</td>
<td align="left">Ghotuo</td>
<td align="left">Niger-Congo</td>
<td align="left">Atlantic-Congo</td>
<td align="left"></td>
<td align="left"></td>
<td align="left">✔️</td>
<td align="left"><a href="https://www.ethnologue.com/language/aaa">Ethnologue</a></td>
</tr>
<tr>
<td align="left">aab</td>
<td align="left">Arum</td>
<td align="left">Niger-Congo</td>
<td align="left">Atlantic-Congo</td>
<td align="left"></td>
<td align="left"></td>
<td align="left">✔️</td>
<td align="left"><a href="https://www.ethnologue.com/language/aab">Ethnologue</a></td>
</tr>

Write JavaScript to turn this table into newline-delimited JSON objects


ChatGPT:

Here's a JavaScript function which can parse HTML tables into newline-delimited JSON objects. This script uses the browser's built-in DOMParser to parse the HTML, and then it walks the DOM to build up the JSON objects:

function tableToJson(table) {
    var parser = new DOMParser();
    var doc = parser.parseFromString(table, "text/html");
    var rows = doc.querySelectorAll("tr");
    var result = [];

    // Use the header row to get the keys
    var keys = Array.from(rows[0].querySelectorAll("th")).map(th => th.textContent.trim());

    // Iterate over the rest of the rows
    for (var i = 1; i < rows.length; i++) {
        var obj = {};
        var cols = rows[i].querySelectorAll("td");

        // Map each cell in the row to a key from the header row
        for (var j = 0; j < cols.length; j++) {
            if (cols[j].textContent.trim() === '✔️') {
                obj[keys[j]] = true;
            } else if (cols[j].querySelector('a')) {
                obj[keys[j]] = cols[j].querySelector('a').href;
            } else {
                obj[keys[j]] = cols[j].textContent.trim() || null;
            }
        }

        result.push(obj);
    }

    // Convert the result array to newline-delimited JSON
    return result.map(JSON.stringify).join('\n');
}

You can use this function like this:

var table = `...`;  // Paste your table here
console.log(tableToJson(table));

Replace '...' with your table's HTML. This script will output a string where each line is a separate JSON object. Each JSON object corresponds to a row in the table. Note that this script assumes that the first row of the table contains the headers.


Prompt:

ASR and TTS and LID should always be boolean false or true


ChatGPT:

Okay, let's adjust the JavaScript function to ensure that ASR, TTS, and LID fields are always either true or false. The presence of the ✔️ character will be translated to true and its absence will be translated to false.

function tableToJson(table) {
    var parser = new DOMParser();
    var doc = parser.parseFromString(table, "text/html");
    var rows = doc.querySelectorAll("tr");
    var result = [];

    // Use the header row to get the keys
    var keys = Array.from(rows[0].querySelectorAll("th")).map(th => th.textContent.trim());

    // Iterate over the rest of the rows
    for (var i = 1; i < rows.length; i++) {
        var obj = {};
        var cols = rows[i].querySelectorAll("td");

        // Map each cell in the row to a key from the header row
        for (var j = 0; j < cols.length; j++) {
            switch(keys[j]) {
                case 'ASR':
                case 'TTS':
                case 'LID':
                    obj[keys[j]] = cols[j].textContent.trim() === '✔️';
                    break;
                case '':
                    obj[keys[j]] = cols[j].querySelector('a').href;
                    break;
                default:
                    obj[keys[j]] = cols[j].textContent.trim() || null;
                    break;
            }
        }

        result.push(obj);
    }

    // Convert the result array to newline-delimited JSON
    return result.map(JSON.stringify).join('\n');
}

Now, the columns labeled 'ASR', 'TTS', and 'LID' are guaranteed to be boolean values. If the cell contains a ✔️, the corresponding field in the JSON object will be true, otherwise false.


Prompt:

Make the input to that function a DOM table object


ChatGPT:

You're right, the input would be more appropriately a DOM table object, not an HTML string. Here's a revised version of the function:

function tableToJson(tableElement) {
    var rows = tableElement.querySelectorAll("tr");
    var result = [];

    // Use the header row to get the keys
    var keys = Array.from(rows[0].querySelectorAll("th")).map(th => th.textContent.trim());

    // Iterate over the rest of the rows
    for (var i = 1; i < rows.length; i++) {
        var obj = {};
        var cols = rows[i].querySelectorAll("td");

        // Map each cell in the row to a key from the header row
        for (var j = 0; j < cols.length; j++) {
            switch(keys[j]) {
                case 'ASR':
                case 'TTS':
                case 'LID':
                    obj[keys[j]] = cols[j].textContent.trim() === '✔️';
                    break;
                case '':
                    obj[keys[j]] = cols[j].querySelector('a').href;
                    break;
                default:
                    obj[keys[j]] = cols[j].textContent.trim() || null;
                    break;
            }
        }

        result.push(obj);
    }

    // Convert the result array to newline-delimited JSON
    return result.map(JSON.stringify).join('\n');
}

You can use this function like this:

var tableElement = document.querySelector("table"); // Select the table in your DOM
console.log(tableToJson(tableElement));

This script will output a string where each line is a separate JSON object. Each JSON object corresponds to a row in the table. Note that this script assumes that the first row of the table contains the headers.


Prompt:

Uncaught TypeError: cols[j].querySelector(...) is null


ChatGPT:

I apologize for the error, it seems that there is an issue when the table cell does not contain a link. Let's revise the code to handle this case:

function tableToJson(tableElement) {
    var rows = tableElement.querySelectorAll("tr");
    var result = [];

    // Use the header row to get the keys
    var keys = Array.from(rows[0].querySelectorAll("th")).map(th => th.textContent.trim());

    // Iterate over the rest of the rows
    for (var i = 1; i < rows.length; i++) {
        var obj = {};
        var cols = rows[i].querySelectorAll("td");

        // Map each cell in the row to a key from the header row
        for (var j = 0; j < cols.length; j++) {
            switch(keys[j]) {
                case 'ASR':
                case 'TTS':
                case 'LID':
                    obj[keys[j]] = cols[j].textContent.trim() === '✔️';
                    break;
                case '':
                    var link = cols[j].querySelector('a');
                    obj[keys[j]] = link ? link.href : null;
                    break;
                default:
                    obj[keys[j]] = cols[j].textContent.trim() || null;
                    break;
            }
        }

        result.push(obj);
    }

    // Convert the result array to newline-delimited JSON
    return result.map(JSON.stringify).join('\n');
}

With this adjustment, the function now checks whether a link exists in the last column before attempting to access its href attribute. If no link exists, it assigns null to the corresponding field in the JSON object.


@simonw
Copy link
Author

simonw commented May 22, 2023

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment