Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save ranchodeluxe/4f22478885b5d9d04557184b88f826ee to your computer and use it in GitHub Desktop.
Save ranchodeluxe/4f22478885b5d9d04557184b88f826ee to your computer and use it in GitHub Desktop.
openAI ETL
import fs from 'fs';
import OpenAI from 'openai';
const openaiClient = new OpenAI({
apiKey: process.env.OPENAI_API_KEY,
});
const main = async () => {
// Get filename from command line arguments
const args = process.argv.slice(2);
if (args.length === 0) {
console.error('Error: Please provide a filename as a command line argument');
console.error('Usage: node taskconvertdemo.js <filename>');
process.exit(1);
}
const userProvidedFileName = args[0];
console.log(`Hello world! This is a task conversion demo. Run timestamp ${new Date().toISOString()}`);
console.log(`Processing file: ${userProvidedFileName}`);
// Run the main function and wait for it to complete
const convertedOutput = await convertTaskFile(userProvidedFileName);
console.log(JSON.stringify(convertedOutput, null, 2));
};
////////////////////////////////////////////////////////////////////////////////////////////////////
// Function to convert task file
// This function reads the file, parses the JSON, extracts the category from the filename,
// and converts the tasks from the old format to the new format using AI.
////////////////////////////////////////////////////////////////////////////////////////////////////
const convertTaskFile = async (taskFileName: string) => {
let userProvidedFileContent: string;
try {
userProvidedFileContent = fs.readFileSync(taskFileName, 'utf8');
} catch (error) {
console.error(`Error reading file "${taskFileName}":`, error);
process.exit(1);
}
// Parse the input JSON
let sourceData;
try {
sourceData = JSON.parse(userProvidedFileContent);
} catch (error) {
console.error(`Error parsing JSON from file "${taskFileName}":`, error);
process.exit(1);
}
// Extract category from filename using regex
const categoryMatch = taskFileName.match(/-task-category-(.+)\.json$/);
const category = categoryMatch ? categoryMatch[1] : "unknown";
// Convert PWItem array to tasks structure
const tasks: { [key: string]: { id: number; name: string; category: string } } = {};
if (sourceData.PWItem && Array.isArray(sourceData.PWItem)) {
for (const [index, taskObjBefore] of sourceData.PWItem.entries()) {
if (taskObjBefore.PWItemID && taskObjBefore.Name) {
// Create sanitized key from name (alphanumerics only) + "Task"
const sanitizedKey = taskObjBefore.Name.replace(/[^a-zA-Z0-9]/g, '') + 'Task';
let taskObjAfter = {
id: taskObjBefore.PWItemID,
name: taskObjBefore.Name,
description: taskObjBefore.Description || '',
category: category
} as any;
const taskObjConvertedWithAI = await convertTaskWithAI(taskObjBefore);
Object.assign(taskObjAfter, taskObjConvertedWithAI);
console.log(`Converted task: ${sanitizedKey} with ID: ${taskObjBefore.PWItemID}`);
tasks[sanitizedKey] = taskObjAfter;
}
};
}
const convertedOutput = {
[`${category.charAt(0).toUpperCase() + category.slice(1)}Tasks`]: tasks
} as any;
console.log(`Converted ${Object.keys(tasks).length} tasks for category: ${category}`);
return convertedOutput;
}
////////////////////////////////////////////////////////////////////////////////////////////////////
// Function to convert a single task object using AI
// This function sends the task object to the AI model for conversion and returns the converted task.
////////////////////////////////////////////////////////////////////////////////////////////////////
const convertTaskWithAI = async (taskObjBefore: any): Promise<any> => {
const messages = [
{
role: "system",
content: `The current time is ${new Date().toISOString()}.`
},
{
role: "developer",
content: `
You are Kai, a data assistant in the real estate sector. Your primary task is to help
administrators at property management companies build a large, complicated data structure called a
"Template".
The Template is a general-purpose form that the property management company uses to collect and
organize information about a property, its spaces, and all repair, renovation, and maintenance
work that it might need -- including contractor labor and purchases of materials, appliances,
tools, etc. The Template is an enormous JSON object, intended to capture all possible repair and
renovation work on all possible properties.
The Template is designed and edited by an office-based user in the property management company
whom we call the "Administrator" or "Admin". The Admin then deploys the Template to the
company's field workers, whom we call "Construction Managers". A Construction Manager's job is to
draft a plan for renovating or remodeling a property, including exact measurements, materials,
and labor needed; this plan is called a "Scan" (because it's a detailed examination of the
property). The process of creating a Scan is called "Walking" the property, and the act of
Walking the property is called a "Walkthrough".
(Though it's not relevant to the process of designing a Template, it's worth noting that
the Scan is then used to generate a "Scope of Work" (SOW) that the company uses to actually
perform the work on the property, such as hiring contractors, ordering materials, etc.)
The structure of the Template is hierarchical. The terms and verbiage used in this hierarchy
have evolved in their meaning over time, and their words don't necessarily mean what they
would normally mean in plain English. Starting with the Template itself as the top-level
object, the hierarchy is as follows:
- Task: A Task is a specific object, item, fixture, or piece of work that might need to be
examined in a Group. It is essentially an "examination item" or "check item" that the
Construction Manager will need to look at during the Walkthrough. For example, in the
"Kitchen" Group, the Construction Manager will need to examine the "Refrigerator",
"Stove", "Sink", etc. In the "Utilities" Group, the Task might be "Fusebox" or
"Water Heater". It's important to remember that a Template is a general-purpose form
that is designed to capture all possible repair and renovation work on all possible
properties, so the Tasks are not specific to any one property. As such, a Group will
have many Tasks, some of which may be mutually exclusive. For example, the "Bathroom"
Group might have Tasks for both a "Bathtub" and a "Shower Stall", because some properties
have bathtubs and some have shower stalls. The Tasks are the second-level organizational
unit in the Template. They are stored under Group objects an array called "taskTemplates".
- Resolution Option (or just "Option"): An Option is a specific action that the Construction
Manager can take on a Task. It is essentially a "choice" that the Construction Manager
can make during the Walkthrough. For example, in the "Refrigerator" Task, the
Construction Manager might have the Option to "Replace", "Repair", or "Keep". In the
"Pool" Group, under the "Pool Deck" Task, the Construction Manager might have the
Option to "Repair Concrete", "Repair Tile", "Repaint", "Resurface", and so on. As with
Tasks, Options are not specific to any one property, but rather need to represent all
possible actions that could be taken on a Task, including all possible variants that
the Construction Manager might encounter in the field. Options are the third-level
organizational unit in the Template. They are stored under Task objects in an array
called "optionTemplates".
- Line Item: A Line Item is a specific item or service that the Construction Manager can order or
purchase as part of a Resolution Option. This corresponds to a specific SKU from a vendor or
supplier such as Home Depot, or a specific type of labor from a subcontractor.
For example, in the "Replace" Option for the "Refrigerator" Task, the Construction Manager
would have a Line Item for every specific model of refrigerator from Home Depot that would
be suitable as a replacement, such as "Samsung 28 cu. ft. Side-by-Side Refrigerator".
In the "Repaint Walls" Option for the "Living Room" Task, the Construction Manager would have a
Line Item for every specific paint color and brand that the company uses, such as "Behr
Ultra Pure White", "Sherwin-Williams Agreeable Gray", etc., as well as a Line Item for the
labor of painting the walls, such as "Labor from Local Painter XYZ". They are the
fourth-level organizational unit in the Template. They are stored under Option objects in an
array called "lineItems".
Every object at every level of the Template hierarchy has the following properties:
- name: string
- description: string
- id: string (a unique identifier for the object)
- category?: string
- values?: dict[string, ValueObject] (a string-keyed map of descriptors of units of price or measurement)
The ValueObject is a simple object with the following properties:
- type: string (the units of the value, such as "sqft" or "usd", or sometimes just "choice")
- description?: string (a brief description of the value)
- value?: This field can be one of:
- an empty string
- a numerical value
- a string describing a simple arithmetic formula describing how to compute the value of this field,
expressed as an operation on the other keys of the "values" map. For example, for a value called
"roomSqFt", this could be "roomLength * roomWidth", with "roomLength" and "roomWidth" also being
defined in the values list.
Here's an example of an Option object taken from a real Template, to show you how the "values"
map can be used to compute the value of a field:
\`\`\`json
...
{
"id": "option_drywall_re_6c3e643f-274a-454f-a3bb-b0def6a9db77",
"name": "Drywall Repair",
"values": {
"repairArea": {
"type": "sqft",
"value": "repairLength * repairWidth",
"description": "Total repair area for Drywall Repair"
},
"repairWidth": {
"type": "ft",
"description": "Width of repair area for Drywall Repair"
},
"repairLength": {
"type": "ft",
"description": "Length of repair area for Drywall Repair"
}
},
"category": "walls",
"description": "Repair drywall in one part of the room represented by this option",
"optionTemplates": [...]
}
...
\`\`\`
The full Template data structure is far too big to fit into this conversation -- you will not be
able to see the entire Template at once. You will only see the Task level and children.
As a data assistant, the most important thing for you to remember about the Template is that it's
*sloppy*. It was originally envisioned as a rigorous data structure, but over time it has been used
by humans who have added all sorts of ad-hoc data to it, including data that doesn't fit the
original structure, as well as data that is inconsistent, incomplete, or inserted into the wrong
place or at the wrong level of the hierarchy. As a result, you may encounter unexpected
data formats or structures when working with the Template.
`
},
{
role: "developer",
content: `
The user will show you a partial Task in an old, obsolete format that they're
trying to move away from. There isn't necessarily a one-to-one mapping between the old format
and the new Task format, so you will need to use your best judgment to convert the old
format into the new one.
TIPS FOR CONVERTING THE OLD FORMAT TO THE NEW TEMPLATE FORMAT:
- If the old format has an ID value, then, in general, preserve that ID value as-is.
`,
},
{
role: "user",
content: `
\`\`\`json
${JSON.stringify(taskObjBefore, null, 2)}
\`\`\`
`
}
] as OpenAI.Responses.ResponseInput;
messages.push({
role: "developer",
content: `
Describe the information that the user has provided, and strategize how to convert/map it into
the Task-level format. We are not trying to convert anything at the Group level just the Task
level and down.
`
});
let llmResponse = await openaiClient.responses.create({
model: "gpt-4.1",
input: messages,
});
let llmReply = llmResponse.output_text;
messages.push({
role: "assistant",
content: llmReply
});
const templateStructSchemaCommonProperties = {
id: {
type: "string",
description: "A unique identifier for the object, such as a UUID or a specific ID from the source data."
},
name: {
type: "string",
description: "The name of the object, such as a Task name."
},
description: {
type: "string",
description: "A brief description of the object."
},
};
const taskStructSchema = {
type: "object",
description: "Each PWItem in the array PWItems in the source data becomes a Task object",
properties: {
contractorType: {
type: "string",
description: "The type of contractor that is relevant to this Task, such as 'drywall', 'plumbing', etc. NOTE: For moulding, always use the value 'carpenter'.",
enum: [
"notApplicable",
"carpenter",
"concrete",
"electrician",
"excavation",
"flooring",
"general",
"glazier",
"hvac",
"insulation",
"landscaper",
"mason",
"painter",
"pestControl",
"plumber",
"roofer",
"septic",
"steel",
"tiler",
"treeService",
"waterproofing",
]
},
optionTemplates: {
type: "array",
items: {
type: "object",
description: "An array of Option objects that are available for this Task. Each Option represents a specific choice or variation for the Task. Each ItemComment in the source data becomes an Option object",
properties: {
...templateStructSchemaCommonProperties,
lineItemTemplates: {
type: "array",
description: "An array of Line Item objects that are available for this Option. Each Line Item represents a specific item or service that can be ordered or purchased as part of the Option.",
items: {
minItems: 1,
type: "object",
description: "A Line Item object that represents a specific item or service that can be ordered or purchased as part of the Option. Every Option must have at least one Line Item.",
properties: {
...templateStructSchemaCommonProperties,
valueDiscussion: {
type: "string",
description: "A discussion of the factors that will ultimately drive what exactly a contractor will order for this Line Item, such as the specific materials, labor, or services that will be needed. This can include considerations such as materialPrice, laborPrice, etc.",
},
values: {
type: "array",
items: {
type: "object",
description: `
A numerical value for material or labor cost/price that allows a contractor to multiply the parent Option.values count/amount by.
Basically, this should answer the questions:
- How do we compute the price of the Line Item based on the parent Option.values count/amount?
NOTE: Don't call any of the values "priceTotal". This will be a special value that is computed based on the values in this collection.
`,
properties: {
key: {
type: "string",
description: "The key for the metadata value, such as 'repairArea' or 'repairLength'."
},
type: {
type: "string",
description: "The units of the value, such as 'sqft' or 'usd', or sometimes just 'count' or 'choice'."
},
description: {
type: "string",
description: "A brief description of what the value represents."
},
value: {
type: "string",
description: "A default value for the metadata value. Can be one of: 1. Empty string; 2. A numerical value; 3. A simple arithmetic formula describing how to compute the value of this field, expressed as an operation on the other keys of the 'values' map (eg: for a value called `roomSqFt`, this could be `roomLength * roomWidth`, with `roomLength` and `roomWidth` also being defined in the values list)."
}
},
required: ["key", "type", "description", "value"],
additionalProperties: false
}
},
priceTotal: {
type: "string",
description: "A special value that acts as an element of the 'values' collection. We're assuming that its type is usd. It is a string that represents the price of the Option, which can be computed based on the values in the 'values' collection. This is typically a formula."
},
},
required: ["id", "name", "description", "values", "valueDiscussion", "priceTotal"],
additionalProperties: false
},
required: ["id", "name", "description"],
additionalProperties: false
},
valueDiscussion: {
type: "string",
description: "A discussion of the factors that will ultimately drive what exactly a contractor will order for this Option, such as the specific materials, labor, or services that will be needed. This can include considerations such as materials, labor, unit count, etc.",
},
values: {
type: "array",
items: {
type: "object",
description: `
A numerical value that allows a contractor to compute the amount/number of some product or service to order such as sqft, ft, in, unitCount.
NOTE: this value will end up as a multiplier in the lineItemTemplates.values.priceTotal field
Basically, this should answer the questions:
- What units is the Option measured in?
`,
properties: {
key: {
type: "string",
description: "The key for the metadata value, such as 'repairArea' or 'repairLength'."
},
type: {
type: "string",
description: "The units of the value, such as 'sqft' or 'usd', or sometimes just 'count' or 'choice'."
},
description: {
type: "string",
description: "A brief description of what the value represents."
},
value: {
type: "string",
description: "A default value for the metadata value. Can be one of: 1. Empty string; 2. A numerical value; 3. A simple arithmetic formula describing how to compute the value of this field, expressed as an operation on the other keys of the 'values' map (eg: for a value called `roomSqFt`, this could be `roomLength * roomWidth`, with `roomLength` and `roomWidth` also being defined in the values list)."
}
},
required: ["key", "type", "description", "value"],
additionalProperties: false
}
},
},
required: ["id", "name", "description", "lineItemTemplates", "valueDiscussion", "values"],
additionalProperties: false
}
},
},
required: [
"contractorType",
"optionTemplates",
],
additionalProperties: false,
};
messages.push({
role: "developer",
content: `Convert the user's data into a Task object.`
});
llmResponse = await openaiClient.responses.create({
model: "gpt-4.1",
input: messages,
text: {
format: {
type: "json_schema",
name: "task_struct_schema",
description: "A JSON schema for the Task structure.",
schema: taskStructSchema,
strict: true // Ensure the response strictly adheres to the schema
}
}
});
llmReply = llmResponse.output_text;
messages.push({
role: "assistant",
content: llmReply
});
let taskObjAfter = JSON.parse(llmResponse.output_text);
// Go through the Task object that's been returned by the LLM.
// Go through each of its optionTemplates. In each one, change
// its values array into a values map.
// @ts-ignore
// for (const option of llmReply.optionTemplates) {
// const valuesMap = {};
// for (const value of option.values) {
// const { key: valueKey, ...restOfValue } = value;
// valuesMap[valueKey] = restOfValue;
// }
// valuesMap["priceTotal"] = {
// type: "usd",
// description: "The total price of the option.",
// value: option.priceTotal
// };
// delete option.priceTotal;
// option.values = valuesMap;
// }
return taskObjAfter;
}
// Example usage of the new function:
// const result = await convertTaskWithAI(someTaskObject);
////////////////////////////////////////////////////////////////////////////////////////////
// Run the main function and wait for it to complete
////////////////////////////////////////////////////////////////////////////////////////////
await main();
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment