Skip to content

Instantly share code, notes, and snippets.

@jargnar
Created August 3, 2025 06:02
Show Gist options
  • Select an option

  • Save jargnar/974396f31c499c9e5b586ccc5ae36f71 to your computer and use it in GitHub Desktop.

Select an option

Save jargnar/974396f31c499c9e5b586ccc5ae36f71 to your computer and use it in GitHub Desktop.
A prompt for ChatGPT Agent to gather car information

Objective

Generate a CSV file cars_bangalore_2025.csv containing ≥50 unique passenger car variants officially sold in Bengaluru, Karnataka, India for Model Year 2025, with complete specifications matching the CarSpec Pydantic schema.

Requirements

Data Quality Standards

  • No missing data: Every field must contain valid values (no blanks, nulls, or "N/A")
  • Unit compliance: All numeric values must use exact units specified in schema
  • Source verification: Each row must cite ≥2 independent, publicly accessible sources
  • Data validation: Every row must pass Pydantic schema validation without errors

Schema Definition

from __future__ import annotations
from decimal import Decimal
from enum import Enum
from typing import Optional

from pydantic import BaseModel, Field, PositiveInt, condecimal, confloat

class TransmissionType(str, Enum):
    MANUAL = "Manual"
    AUTOMATIC = "Automatic"
    CVT = "CVT"
    DCT = "DCT"

class Drivetrain(str, Enum):
    FWD = "FWD"
    RWD = "RWD"
    AWD = "AWD"        # full-time
    FOUR_WD = "4x4"    # part-time / selectable

class FuelType(str, Enum):
    PETROL = "Petrol"
    DIESEL = "Diesel"
    CNG = "CNG"
    ELECTRIC = "Electric"
    HYBRID = "Hybrid"

class BodyType(str, Enum):
    HATCHBACK = "Hatchback"
    SEDAN = "Sedan"
    SUV = "SUV"
    MUV = "MUV/MPV"
    COUPE = "Coupe"
    CONVERTIBLE = "Convertible"
    PICKUP = "Pickup"
    VAN = "Van"

class EngineSpec(BaseModel):
    """Primary power-plant specifications."""
    type: str  # e.g., "Inline-4", "V6", "Electric Motor"
    displacement_cc: PositiveInt
    num_cylinders: PositiveInt
    max_power_kw: confloat(gt=0)
    max_torque_nm: confloat(gt=0)

class SafetySpec(BaseModel):
    """Occupant protection features and ratings."""
    ncap_rating_stars: PositiveInt = Field(..., ge=1, le=5)
    airbags_count: PositiveInt

class CarSpec(BaseModel):
    """Complete specification for passenger vehicles sold in Bengaluru, India (MY-2025)."""
    
    # Identity & Pricing
    manufacturer_name: str
    car_model: str
    car_variant: Optional[str] = None
    price_on_road_bangalore_inr: condecimal(max_digits=12, decimal_places=2)
    
    # Powertrain
    engine: EngineSpec
    transmission_type: TransmissionType
    drivetrain: Drivetrain
    fuel_type: FuelType
    
    # Performance & Efficiency
    fuel_efficiency_kmpl: confloat(gt=0)
    turning_radius_m: confloat(gt=0)
    
    # Weight Specifications (kg)
    kerb_weight_kg: PositiveInt
    gross_weight_kg: PositiveInt
    
    # Dimensions (mm)
    body_length_mm: PositiveInt
    body_width_mm: PositiveInt
    body_height_mm: PositiveInt
    wheelbase_mm: PositiveInt
    ground_clearance_mm: PositiveInt
    
    # Practicality
    boot_space_l: PositiveInt
    fuel_tank_capacity_l: PositiveInt
    seating_capacity: PositiveInt
    body_type: BodyType
    
    # Safety
    safety: SafetySpec

Acceptable Data Sources

Primary Sources (Preferred)

  1. Manufacturer Official: India websites, downloadable brochures, press kits, spec sheets
  2. Government/Testing: ARAI reports, BNCAP test results, Global NCAP ratings
  3. Auto Portals: CarWale, CarDekho, Autocar India (for on-road prices and missing specs)

Source Priority

  • Use manufacturer data when available
  • Cross-verify with at least one additional source
  • Document source URLs in separate columns (source_1, source_2)

Data Collection Workflow

Step 1: Build Master List

  1. Identify all manufacturers selling passenger cars in India (2025 MY)
  2. List all models and variants available in Bangalore
  3. Target variety: Include mix of body types, fuel types, and price segments
  4. Ensure ≥50 unique variants (same model, different engine/trim = unique)

Step 2: Data Extraction & Normalization

For each variant, collect:

  1. Identity: Exact manufacturer name, model, variant designation
  2. Pricing: On-road Bangalore price (including RTO, insurance, all fees)
  3. Technical specs: All fields per schema, converted to required units
  4. Validation: Verify data consistency and completeness

Step 3: Handle Edge Cases

Price Calculation

  • Use CarWale/CarDekho "On-Road Price Calculator" for Bangalore
  • Include: Ex-showroom + RTO + Insurance + Other charges
  • If unavailable, calculate: Ex-showroom × 1.15 (document assumption)

Fuel Efficiency

  • Priority: ARAI combined cycle > ARAI city/highway average > Manufacturer claim
  • For EVs: Use ARAI range ÷ battery capacity for km/kWh equivalent

NCAP Ratings

  • If both BNCAP & Global NCAP exist: Use higher rating
  • Add notes column (not in schema) documenting which test
  • If no NCAP: Check manufacturer's internal crash test ratings

Weight Specifications

  • Gross weight must be ≥ Kerb weight
  • If only GVW available: Kerb = GVW - (payload capacity or 150kg × seating)
  • Document calculation method in notes

Electric Vehicles

  • Engine type: "Electric Motor"
  • Displacement: Battery capacity in Wh (e.g., 40000 for 40kWh)
  • Cylinders: Number of motors
  • For efficiency: Use km/kWh converted to equivalent kmpl

Output Format

CSV Structure

manufacturer_name,car_model,car_variant,price_on_road_bangalore_inr,engine.type,engine.displacement_cc,engine.num_cylinders,engine.max_power_kw,engine.max_torque_nm,transmission_type,drivetrain,fuel_type,fuel_efficiency_kmpl,turning_radius_m,kerb_weight_kg,gross_weight_kg,body_length_mm,body_width_mm,body_height_mm,wheelbase_mm,ground_clearance_mm,boot_space_l,fuel_tank_capacity_l,seating_capacity,body_type,safety.ncap_rating_stars,safety.airbags_count,source_1,source_2,notes

Field Specifications

Field Type Constraints Example (updated)
manufacturer_name str Non-empty "Maruti Suzuki"
car_model str Non-empty "Swift"
car_variant str/null Optional "VXi AMT"
price_on_road_bangalore_inr Decimal(12,2) > 0 944707.00
engine.type str Non-empty "Inline-3"
engine.displacement_cc int > 0 1197
engine.num_cylinders int > 0 3
engine.max_power_kw float > 0 60.0
engine.max_torque_nm float > 0 111.7
transmission_type enum See schema "Automatic"
drivetrain enum See schema "FWD"
fuel_type enum See schema "Petrol"
fuel_efficiency_kmpl float > 0 25.75
turning_radius_m float > 0 4.8
kerb_weight_kg int > 0 925
gross_weight_kg int ≥ kerb_weight 1355
body_length_mm int > 0 3860
body_width_mm int > 0 1735
body_height_mm int > 0 1520
wheelbase_mm int > 0 2450
ground_clearance_mm int > 0 163
boot_space_l int > 0 265
fuel_tank_capacity_l int > 0 37
seating_capacity int > 0 5
body_type enum See schema "Hatchback"
safety.ncap_rating_stars int 1-5 1
safety.airbags_count int > 0 6

Quality Checklist

  • ≥50 unique variants included
  • All fields populated (no blanks/nulls except optional variant)
  • Each row has ≥2 source citations
  • Numeric values use correct units
  • Prices reflect Bangalore on-road costs for 2025
  • Data validates against Pydantic schema
  • CSV loads without parsing errors
  • Variety in manufacturers, segments, fuel types represented
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment