Skip to content

Instantly share code, notes, and snippets.

@Dev-Dipesh
Last active December 1, 2024 05:13
Show Gist options
  • Save Dev-Dipesh/65fe40f16acdd806eaa0355ac518dc7c to your computer and use it in GitHub Desktop.
Save Dev-Dipesh/65fe40f16acdd806eaa0355ac518dc7c to your computer and use it in GitHub Desktop.
Turning Zomato Order History into Insights: A Personal Project
// This is the main code that processes the raw data from the 'Raw Data' sheet and places it in 'Formatted Data' sheet.
function processRawData() {
var sheet = SpreadsheetApp.getActiveSpreadsheet();
var rawSheet = sheet.getSheetByName('Raw Data'); // Raw data arrives here from Zapier
var formattedSheet = sheet.getSheetByName('Formatted Data'); // Formatted data is posted here after processing
// Get the data from the Raw Data sheet
var rawData = rawSheet.getDataRange().getValues();
for (var i = 1; i < rawData.length; i++) { // Start from row 1 to skip headers
var row = rawData[i];
if (row[2] === "Processed") continue; // Skip already processed rows (assuming column C)
var processedRow = [];
// Extract fields
var rawDate = row[0]; // Assuming Date is in the first column
var emailBody = row[1]; // Assuming Email Body is in the second column
// Process Date
var formattedDate = ""; // Default empty
try {
var dateMatch = rawDate.match(/\b(\d{1,2}\s\w+\s\d{4})\b/);
if (dateMatch) {
var dateObj = new Date(dateMatch[0]); // Extract and parse the date
formattedDate = Utilities.formatDate(dateObj, Session.getScriptTimeZone(), "dd/MM/yy");
}
} catch (err) {
formattedDate = "Invalid Date"; // Fallback for invalid dates
}
// Extract Restaurant Name
var restaurantName = emailBody.match(/from (.+)/)?.[1]?.trim() || "Unknown";
// Extract Order Items
var orderStart = emailBody.indexOf("Delivered") + "Delivered".length;
var orderEnd = emailBody.indexOf("Total paid -");
var orderText = emailBody.substring(orderStart, orderEnd).trim();
var orderItems = orderText.split("\n").filter(item => item.includes("X") && /\d/.test(item)); // Filter valid items
// Extract Total Paid
var totalPaidMatch = emailBody.match(/Total paid -\s*₹([\d.,]+)/);
var totalPaid = totalPaidMatch ? parseFloat(totalPaidMatch[1].replace(/,/g, '')).toFixed(2) : "0.00";
// Skip rows with price 0
if (parseFloat(totalPaid) === 0) {
rawSheet.getRange(i + 1, 3).setValue("Skipped: Price 0"); // Mark as skipped
continue;
}
// Append processed data
processedRow.push(formattedDate, restaurantName, orderItems.join(", "), totalPaid);
// Append to Formatted Data sheet
formattedSheet.appendRow(processedRow);
// Mark as processed in Raw Data
rawSheet.getRange(i + 1, 3).setValue("Processed"); // Add "Processed" in column C
}
}
// This adds a custom drop-down menu in Google Sheets to run the scripts manually if required.
function onOpen() {
var ui = SpreadsheetApp.getUi();
ui.createMenu('Utilities') // Creates a new menu called "Utilities"
.addItem('Remove Duplicates', 'removeDuplicates') // Adds an option to call removeDuplicates
.addItem('Process Raw Data', 'processRawData') // Optional: Adds an option to run processRawData
.addToUi(); // Adds the menu to the UI
}
// This code removes the duplicate entries from the formatted data
function removeDuplicates() {
var sheet = SpreadsheetApp.getActiveSpreadsheet().getSheetByName('Formatted Data');
var data = sheet.getDataRange().getValues(); // Get all data
var uniqueData = [];
var seenKeys = new Set();
// Iterate through the rows
for (var i = 0; i < data.length; i++) {
var row = data[i];
var key = row.join("||"); // Create a unique key by combining all columns
// Add the row to uniqueData only if it's not already seen
if (!seenKeys.has(key)) {
uniqueData.push(row);
seenKeys.add(key);
}
}
// Clear the sheet and re-insert unique rows
sheet.clearContents();
sheet.getRange(1, 1, uniqueData.length, uniqueData[0].length).setValues(uniqueData);
}

These are some of the referrence questions we can ask ChatGPT based on our data in the sheet to gain insights.


1. Spending Patterns

  • What is my total spending in November 2024?
  • What is my average spend per order?
  • What is the highest amount I’ve spent on a single order, and where?
  • What is the lowest amount I’ve spent on a single order, and where?
  • Which day of the week do I spend the most on food?
  • How does my weekly spending vary?
  • How many orders did I place above ₹1,000?
  • Which restaurants contribute the most to my total spending?

2. Restaurant Preferences

  • Which are my top 3 most-visited restaurants by spending?
  • Which are my top 3 most-visited restaurants by order count?
  • Which restaurant do I spend the most on average per order?
  • Are there restaurants where I’ve only ordered once?
  • What’s my most frequent restaurant during weekdays vs. weekends?
  • Which restaurants have I ordered from multiple times on the same day?

3. Dish Insights

  • What are the most frequently ordered dishes?
  • Do I repeat the same dish multiple times at the same restaurant?
  • How much do I spend on average for different categories of dishes (e.g., desserts, salads)?

4. Health and Diet

  • How many of my orders can be categorized as health-conscious?
  • How much have I spent on salads this month?
  • What percentage of my orders include desserts or indulgent dishes?
  • Which restaurant offers the healthiest options based on my orders?

5. Time-Based Analysis

  • How has my spending varied over the course of November 2024?
  • Did I order more frequently at the start or end of the month?
  • Which days had the highest order frequency?
  • How many days did I order more than once?
  • Which time of the month do I tend to splurge the most?

6. Recommendations

  • What are some cost-effective alternatives to my most expensive restaurants?
  • Suggest new dishes based on the types of food I typically order.
  • Are there patterns that indicate I prefer certain cuisines?
  • Recommend dishes that align with my frequent choices but are less expensive.
  • Which restaurants should I avoid to reduce my spending?

7. Predictive Insights

  • Based on my current patterns, how much will I likely spend next month?
  • Predict my favorite restaurant for next month based on recent trends.
  • What type of dishes am I most likely to order next week?
  • If I continue this pattern, what will be my highest spending day in December?

8. Combination Insights

  • Which restaurants offer my favorite dishes at the lowest price?
  • Are there any restaurants where I consistently spend more than ₹1,500 per order?
  • Which dishes are commonly ordered from my top 3 restaurants?
  • How often do I try new restaurants vs. repeating old ones?

9. Cost Optimization

  • How can I reduce my average spending per order?
  • Which restaurants or dishes contribute the most to overspending?
  • If I stop ordering from my most expensive restaurant, how much can I save?
  • Are there dishes that I order frequently but could replace with cheaper alternatives?

10. Miscellaneous Fun Insights

  • If I had to rank all restaurants by total orders, how would they stack up?
  • How much have I spent on coffee or beverages this month?
  • What’s the weirdest combination of dishes I’ve ordered?
  • Which dish has the longest or most descriptive name?
  • How much have I spent on desserts overall?

Turning Zomato Order History into Insights: A Personal Project

Armed with a lovely weekend morning, a delicious cappuccino, and a sprinkle of curiosity, I embarked on a mission to uncover insights from my daily Zomato orders. Ordering food every day for my partner and me got me wondering: what are our favorite restaurants? How much are we spending? Are there patterns in our choices?

But here’s the catch—Zomato doesn’t let you export order history. Worse, their website no longer even has an order history page. This left me with two potential sources for data: the mobile app or email receipts. Naturally, I turned to emails—flexible, reliable, and a great starting point for any DIY data project.


Explored Solutions

1. Google Cloud Console + Gmail API + Jupyter Notebook

This solution seemed promising at first but came with significant complexity. To access the Gmail API, Google requires you to set up a Google Cloud Console project and create an OAuth consent screen. This involves:

  • Hosting a website with Terms of Service (TOS) and Privacy Policy pages.
  • Submitting these documents for Google’s review to enable the required API scope.

Essentially, you need to provide users (even for personal use) with a consent page that explains what data your app will access. While this setup makes sense for production-grade applications, it felt like a corporate-sized solution for a small personal project. Combined with the lower developer experience of managing the API and credentials, I decided to explore simpler options.


2. Gmail > Zapier + Code by Zapier > Google Sheets

This worked well. Zapier allowed me to:

  • Pull email data directly from Gmail.
  • Use Python (via Code by Zapier) to extract structured information like date, restaurant, order items, and total cost.
  • Send the processed data to Google Sheets.

The only downside? Zapier’s free tier supports only two steps, and adding a third step required upgrading to a paid plan. For a casual project, that wasn’t ideal.


3. Gmail > Zapier > Google Sheets + Google Apps Script

This final solution was the clear winner—a free, user-friendly, and scalable approach. Think of it like building with microservices: modular, single-responsibility components that work seamlessly together. Here’s how I implemented it step-by-step:

  1. Google Sheet Setup:

    • Created a "Raw Data" tab for storing unprocessed email data with two columns - Date & Email Body.
    • Added a "Formatted Data" tab for the cleaned, structured output with four columns - Date, Restaurant, Order, Total Paid.
  2. Zapier Integration:

    • Created a Zap to trigger whenever a new Zomato order email arrived in Gmail.
    • Configured the Zap to forward the raw email content (including date and body) to the "Raw Data" tab of the linked Google Sheet.
  3. Google Apps Script:

    • Wrote a script to process raw email content into structured fields like:
      • Date (converted to a clean format like DD/MM/YY).
      • Restaurant Name (extracted from the email text).
      • Order Items (filtered and cleaned for clarity).
      • Total Paid (formatted as a two-decimal currency).
    • Added logic to skip rows with a price of 0 (e.g., canceled orders).
    • Marked processed rows to avoid duplication.
    • Set a trigger to run on-change in raw data sheet
  4. On-Demand and Automated Deduplication:

    • Created a deduplication script that ensures no duplicate entries are added to the Formatted Data tab. This script:
      • Removes duplicates by comparing all columns in each row.
      • Can also be run manually through a custom menu in Google Sheets.
    • Runs automatically every 12 hours via a time-based trigger.
  5. Custom Menu:

    • Added a "Utilities" menu in Google Sheets for easy access to scripts:
      • Process Raw Data: Formats new entries from the Raw Data tab.
      • Remove Duplicates: Cleans up duplicate entries on demand.

What Happens When an Email Arrives

  1. Trigger:

    • Zapier detects the new email in Gmail and extracts its raw content (date, body, etc.).
  2. Store Raw Data:

    • Zapier sends the email content to the Raw Data tab in Google Sheets, where each email is logged in a new row.
  3. Data Processing:

    • The processRawData script runs on the raw data, extracting key fields like:
      • Date (formatted as DD/MM/YY)
      • Restaurant Name
      • Order Items
      • Total Paid
    • Processed rows are appended to the Formatted Data tab, and the original raw rows are marked as "Processed" in the Raw Data tab.
  4. Deduplication:

    • Every 12 hours, the deduplication script runs automatically, scanning the Formatted Data tab and removing duplicate entries.
    • If needed, the script can also be run manually via the custom menu under Utilities > Remove Duplicates.

Why This Solution Stands Out

  • Free: No extra costs, leveraging Zapier’s free tier and Google’s tools.
  • User-Friendly: Easy enough for anyone to replicate without advanced technical skills.
  • Scalable and Modular: Each component serves a single purpose (data extraction, processing, deduplication) and can be tweaked independently as needed.

Next Steps

  • Insights Dashboard: Use Google Looker Studio to create interactive dashboards and gain insights from the processed data.
  • Alerts and Notifications: Set up email, SMS, other alerts using a new Zap in Zapier. Simply connect the formatted data tab to your preferred notification channel to remain in free tier.
  • AI: Use ChatGPT or other AI tools to generate insights from the data. See the insights.md for referrence questions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment