Skip to content

Instantly share code, notes, and snippets.

@Casperhr
Created September 12, 2025 07:04
Show Gist options
  • Select an option

  • Save Casperhr/15110b248aab7f3118c1be0c26c491fe to your computer and use it in GitHub Desktop.

Select an option

Save Casperhr/15110b248aab7f3118c1be0c26c491fe to your computer and use it in GitHub Desktop.
### Prompt: Parse pricing documents → Monta price-group XLSX (complete standalone spec)
**CRITICAL REQUIREMENT**: The output MUST have EXACTLY these 14 columns in this EXACT order:
1. Tariff
2. Tariff ID
3. min. Power
4. max. Power
5. Power Type
6. CountryCode
7. PartyID
8. kWh Price
9. Minute Fee
10. Minute fee after x min
11. Idle Fee
12. Idle Fee after x min
13. Start Fee
14. Fallback
**Goal**
Ingest a pricing source (XLSX, CSV, PDF, or link). Generate an XLSX file with normalized pricing data using the column headers above.
Output format: XLSX spreadsheet with exactly 14 columns as listed above.
No tariffs, no spot, no dynamic pricing. One currency per group. No percentages. All monetary values as **decimal numbers where 1.00 = 1 EUR** (e.g., €0.35 → `0.35`, 35 ct/kWh → `0.35`).
---
## 1) Output contract
Generate an XLSX file with the following columns:
### 1.1 Column definitions
**MANDATORY**: These are the ONLY columns allowed. Do NOT add any additional columns under any circumstances:
1. **Tariff**: Human-readable tariff name from the input file (e.g., "Commune de Allanche"). **Do NOT add operator name. If the tariff is differentiated by power band (e.g., AC, DC, or specific kW ranges like ≤22kW, 22-50kW), this differentiation MUST be included in the Tariff name (e.g., "Commune de Allanche 22kW AC", "Fast Charge >50kW DC"). This ensures unique and descriptive tariff names for each distinct pricing entry.**
2. **Tariff ID**: The EXACT tariff ID/code from the input file. **Use exactly as provided in the source data. If the source provides multiple ID-like fields (e.g., 'ID Tariff', 'Tariff code', 'Code', 'External Ref'), prioritize them in this order: 'Tariff ID', 'ID Tariff', 'Tariff code', 'Code', 'External Ref'. If no explicit tariff ID is found in the source, a unique, deterministic ID MUST be generated using the following format: `[CountryCode]-[PartyID]-[slugified Tariff Name]-[Power Type (AC/DC)]`. For example, `FR-FRESHMILE-COMMUNE-DE-ALLANCHE-AC`. Ensure any generated ID is consistent for the same tariff characteristics across different runs.**
3. **min. Power**: Minimum power in kW for this tariff band (e.g., 0 for ≤22kW, 22 for 22-50kW). **Leave empty if not specified**
4. **max. Power**: Maximum power in kW for this tariff band (e.g., 22 for ≤22kW, 50 for 22-50kW). **Leave empty if not specified**
5. **Power Type**: AC or DC (infer from power levels: ≤22kW=AC, >22kW=DC, HPC=DC, Ultra/Super=DC). **Leave empty if power information not available**
6. **CountryCode**: ISO 3166-1 alpha-2 code if available (e.g., "FR", "DE"). Infer from:
- Location names in tariff (e.g., "France" → "FR")
- CPO name patterns (e.g., "Freshmile" often indicates "FR")
- Leave empty if uncertain
7. **PartyID**: CPO/EMSP identifier if available (e.g., "FRE", "TEN"). Extract from:
- Explicit party codes in data
- Standard abbreviations (e.g., "TotalEnergies" → "TEN")
- Leave empty if not found
8. **kWh Price**: Price per kWh WITHOUT VAT if type is "kwh", otherwise empty
9. **Minute Fee**: Price per minute WITHOUT VAT if type is "min" OR if there's a minute fee in addition to kWh
10. **Minute fee after x min**: Number of minutes before minute fee applies (from applyAfterMinutes)
11. **Idle Fee**: Price per minute WITHOUT VAT for idle fee
12. **Idle Fee after x min**: Number of minutes before idle fee applies
13. **Start Fee**: Session/transaction start fee WITHOUT VAT
14. **Fallback**: "TRUE" or "FALSE" (default: FALSE)
### 1.2 Data formatting rules
- **ALL numeric price columns contain values WITHOUT VAT**:
- kWh Price
- Minute Fee
- Idle Fee
- Start Fee
- Numeric values should use decimal format (e.g., 0.35 for 35 cents)
- Leave cells empty (not 0 or "N/A") when a value doesn't apply
- Power values should be integers (e.g., 22, 50, 150) or **empty** if not specified
- Time values (minute fee after x min) should be integers
- Boolean values (Fallback) should be "TRUE" or "FALSE"
### 1.3 NO Summary row in file
**DO NOT add a summary row to the XLSX file**. The summary should only be displayed in the console/chat output, never written to the file.
- The last row should be a regular tariff entry
- Summary statistics go in the parsing output message only
### 1.4 Skipped items sheet
Create a second sheet named "Skipped" with columns:
- Source ID
- Name
- Raw Text
- Reason
### 1.5 Special handling for XLSX output
- The numeric columns (kWh Price, Minute Fee, etc.) contain the extracted values WITHOUT VAT
- For combined tariffs (e.g., 0.35 €/kWh + 0.05 €/min):
- Create one row with all applicable fees filled in
- **MANDATORY VAT calculations**:
- ALL numeric values in the sheet are WITHOUT VAT
- When source prices include VAT, convert to excl. VAT for numeric columns
- Use country-specific VAT rates for conversion: FR=20%, DE=19%, NL=21%, BE=21%, etc.
- Example: Source shows 0.42€ incl. VAT (FR) → kWh Price = 0.42 / 1.20 = 0.35€
---
## 2) Input handling
### 2.1 Accepted containers
- XLSX / XLS
- CSV
- PDF or link (expect OCR/extraction upstream; parse any available text/tables)
### 2.2 Relevant sheet/page detection
- Process **ALL sheets/tabs** that contain tariff information, not just the first one found
- Ignore **mapping** tabs/pages (charge point ↔ tariff, connector lists, support pages).
- Select sheets/pages with **actual tariff definitions**: text containing `€/kWh`, `€/min`, `€/15min`, `€/hour`, or free-text price descriptions.
- **Multi-sheet processing**: Some files split tariffs across multiple tabs (e.g., by region, customer type, power level)
- Process each relevant sheet independently
- Maintain tariff relationships across sheets when clear identifiers exist
- Combine results from all sheets into a single output
Heuristics for mapping-vs-tariff:
- Mapping indicators in sheet name: `mapping`, `connector`, `charge point`, `station`, `EVSE`, `CSO`, `PMO`, `support`, `liste`, `list`.
- Tariff sheets contain tokens like: `€/kWh`, `ct/kWh`, `c/kWh`, `cents/kWh`, `/kWh`, `par kWh`, `€/min`, `ct/min`, `/min`, `par minute`, `€/15min`, `1/4h`, `hour`, `heure`.
### 2.3 Complex tariff structures
Handle these multi-dimensional tariff patterns:
#### 2.3.1 Adjacent pricing columns
- Files may have multiple price columns side-by-side (e.g., "AC Price", "DC Price", "HPC Price")
- Create separate `priceGroup` entries for each price dimension
- Include the dimension in both `externalId` and `name`
#### 2.3.2 Matrix/grid layouts
- Some files use grid formats with power levels as columns and tariff names as rows
- Each cell intersection represents a unique tariff variant
- Parse each valid cell as a separate `priceGroup`
#### 2.3.3 Grouped sections
- Tariffs may be grouped by customer type, region, or time period within the same sheet
- Maintain group context in the `name` field
- Process all groups, don't stop at the first one
#### 2.3.4 Multi-tab relationships
- Tab names often indicate dimensions (e.g., "B2C Prices", "B2B Prices", "Fleet Prices")
- Include tab context in the `externalId` to ensure uniqueness
- Process all relevant tabs, combining results
### 2.4 Column detection (flexible)
**Auto-detect columns** - do NOT assume fixed names:
- **Identifier**: Look for `ID Tariff`, `Tariff code`, `Tariff ID`, `Code`, or similar
- **Name**: Look for `Tariffs`, `Tariff`, `Tariff Name`, `Name`, or similar
- **Price details**:
- Current: `Current Price`, `Prix actuel`, `Huidige prijs`, `Current pricing details`
- New: `New Price`, `Nouveau prix`, `Nieuwe prijs`, `New pricing details`
- Generic: `Price`, `Pricing`, `Tarif`, `Details`, `Price (incl VAT)`
- **IMPORTANT**: Always prefer "New" over "Current" when both exist
- **Operator**: Look for `Operator`, `CPO`, `Provider`, or similar
- **Location**: Look for `Location`, `Site`, `Address`, or similar
- **Currency**: If dedicated column exists (else infer from symbols)
- **VAT status**: Check for "(incl VAT)", "(excl VAT)" in headers
- **Power/Speed**: Columns indicating charging speed or power level
- Common French columns: `Tarifs`, `ID Tarif`, `Prix (TTC)`, `Détails tarif`
- Common Dutch columns: `Tarieven`, `Tarief ID`, `Prijs (incl BTW)`
---
## 3) Normalization rules
### 3.1 Currency (one per group)
- Detect by symbol/code in row, header, or sheet.
- Use ISO code string, e.g., `"EUR"`.
- Never mix currencies within a group. If a sheet mixes, create separate groups.
### 3.2 VAT
- ALL numeric columns in the output must contain prices WITHOUT VAT
- If source prices are labeled "incl. VAT" → Convert to excl. VAT for numeric columns
- If source prices are "excl. VAT" → Use as-is in numeric columns
- If unclear → assume prices include VAT (most consumer tariffs do) and convert
- **CRITICAL VAT conversion**:
- **ALWAYS use the row's specific VAT rate when available** (may vary by tariff/region)
- If no row-specific rate, use standard country VAT rates:
- AT (Austria): 20%
- BE (Belgium): 21%
- BG (Bulgaria): 20%
- CH (Switzerland): 8.1%
- CY (Cyprus): 19%
- CZ (Czech Republic): 21%
- DE (Germany): 19%
- DK (Denmark): 25%
- EE (Estonia): 22%
- ES (Spain): 21%
- FI (Finland): 24%
- FR (France): 20%
- GB (United Kingdom): 20%
- GR (Greece): 24%
- HR (Croatia): 25%
- HU (Hungary): 27%
- IE (Ireland): 23%
- IT (Italy): 22%
- LT (Lithuania): 21%
- LU (Luxembourg): 17%
- LV (Latvia): 21%
- MT (Malta): 18%
- NL (Netherlands): 21%
- NO (Norway): 25%
- PL (Poland): 23%
- PT (Portugal): 23%
- RO (Romania): 19%
- SE (Sweden): 25%
- SI (Slovenia): 22%
- SK (Slovakia): 20%
- If country unknown, use 20% as default
- Formula: Price excl. VAT = Price incl. VAT / (1 + VAT rate)
- Example: If row shows 21% VAT and price is 0.42€ incl. → 0.42 / 1.21 = 0.3471€ excl.
- Round to 4 decimal places
### 3.3 Numbers and units (multilingual)
- Accept comma or dot decimals. Strip spaces and symbols.
- **ALL prices must be stored as decimal numbers where 1.00 = 1 EUR**
#### Standard unit mappings:
- **kWh pricing**:
- English: `€/kWh`, `/kWh`, `per kWh`
- French: `par kWh`, `par kWh entamé`, `le kWh`
- Dutch: `per kWh`, `per geleverde kWh`
- Cents: `ct/kWh`, `c/kWh`, `cents/kWh` → divide by 100
- **Minute pricing**:
- English: `€/min`, `/minute`, `per minute`
- French: `par minute`, `la minute`
- Dutch: `per minuut`
- Cents: `ct/min` → divide by 100
- **15-minute blocks**:
- French: `€/quart d'heure`, `le 1/4h`, `par 15 min`, `les 15 minutes`, `€/15min`
- English: `€/15min`, `per 15 minutes`, `per quarter-hour`
- Dutch: `per kwartier`, `per 15 minuten`
- **CRITICAL**: Always divide by 15 to get per-minute rate
- Example: "3 €/15min" → Minute Fee = 3/15 = 0.20
- **Hourly blocks**:
- French: `€/heure`, `par heure`, `l'heure`, `€/h`
- English: `€/hour`, `per hour`, `€/h`, `/hr`
- Dutch: `per uur`, `€/uur`
- German: `€/Stunde`, `pro Stunde`
- **CRITICAL**: Always divide by 60 to get per-minute rate
- Example: "12 €/hour" → Minute Fee = 12/60 = 0.20
#### Special patterns:
- **Free/gratuit**: `gratuit`, `gratuite`, `free`, `gratis` → price = 0
- **After/après**: `après X min/h`, `after X minutes`, `na X minuten` → delay indicator
- **Then/puis**: `puis`, `then`, `daarna` → sequential fee indicator
- **Connection/session**: `à la connexion`, `par charge`, `forfait`, `per sessie` → Start Fee
- **First/premières**: `les X premières minutes`, `first X minutes` → initial period
### 3.3.1 Free tariff handling
**CRITICAL: Never skip free tariffs**
- When encountering `gratuit`, `gratuite`, `free`, `gratis`, `kostenlos` → set price = 0
- Free tariffs are valid entries that MUST be included in output
- Map to appropriate type with 0 price:
- `gratuit` for kWh → `kwh_price = 0`
- `free charging` → `kwh_price = 0` or `minute_fee = 0` based on context
- `0 €/kWh` → `kwh_price = 0`
- Do NOT skip with reason "free tariff" or similar
- Free tariffs count toward success rate metrics
- If only one flat base price provided: set `priceMin = priceMax = price`.
### 3.4 External ID (which maps to Tariff ID):
- **CRITICAL**: Use the EXACT tariff ID/code from the input file. **Do NOT create new IDs or modify existing ones UNLESS the source ID is entirely absent. If no ID exists in the source, construct a deterministic ID based on CountryCode, PartyID, the slugified Tariff Name (including power band if applicable), and Power Type.**
- **Generated ID format:** `[CountryCode]-[PartyID]-[slugified Tariff Name]-[Power Type (AC/DC)]` (e.g., `FR-FRE-COMMUNE-DE-ALLANCHE-AC`).
- If the source file has a tariff ID column, use that value exactly as provided.
- Only leave empty if no ID exists in the source data **and a deterministic ID cannot be reliably generated from other fields in that row based on the specified format.**
### 3.5 Name (Tariff column)
- Use the tariff name from the input file WITHOUT adding operator/issuer name.
- Example: If input shows "Commune de Allanche", use that (NOT "Freshmile Commune de Allanche").
- **Include power band info (e.g., "22kW", ">50kW", "AC", "DC") directly in the Tariff name if multiple power bands exist for a single base tariff. This ensures each distinct pricing tier has a unique and descriptive Tariff name (e.g., "Commune de Allanche 22kW AC", "Fast Charge >50kW DC").**
- Do NOT prepend operator names like Freshmile, TotalEnergies, etc.
### 3.6 Empty price groups (mandatory rule)
If a row (tariff group) contains no numeric price values (no kWh Price, no Minute Fee, no Idle Fee, and no Start Fee), then:
- **Do NOT keep it in the Tariffs sheet**
- **Move it to the Skipped sheet**
- **Use the reason "no numeric price found"**
This prevents placeholder or mapping-only rows from producing empty entries in the main output.
### 3.7 Current vs New pricing
When input data contains both "current" and "new" pricing columns:
- **ALWAYS use "new" pricing if available**
- If "new" column is empty or missing, fall back to "current"
- Common patterns:
- "Current Price" vs "New Price"
- "Prix actuel" vs "Nouveau prix" (French)
- "Huidige prijs" vs "Nieuwe prijs" (Dutch)
- Any variation of current/actual vs new/future pricing
### 3.8 Bands / power tiers
- If a tariff provides prices per **charge speed** (e.g., `≤22 kW`, `22–50 kW`, `>50 kW`, `HPC`), create **one row per band** in the XLSX.
- Normalize band tokens for Tariff ID:
- `≤22 kW` → `22kw` (min. Power=0, max. Power=22)
- `22–50 kW` → `22-50kw` (min. Power=22, max. Power=50)
- `>50 kW` → `50pluskw` (min. Power=50, max. Power=350)
- `HPC` → `hpc` (min. Power=150, max. Power=350)
- `Ultra` → `ultra` (min. Power=350, max. Power=500)
- **If no power information found**: Leave min. Power, max. Power, and Power Type **empty**
- Include the band in the Tariff ID only if power information is found
---
## 4) Fee mapping
### 4.1 Block and delayed fee normalization
#### Sequential fees with "then/puis/daarna/après"
When fees change after a period:
- "0 €/min for 45 min, then 0.20 €/min"
- "gratuit pendant 30 min, puis 0,10 €/minute"
- "gratis voor 30 minuten, daarna 0,15 €/min"
- "puis 0,15 €/min après 30 minutes"
- "then €0.15 per minute after 30 min"
**Normalization**:
- **CRITICAL**: For "puis"/"après" sequences, set Minute Fee to the rate AFTER the transition
- Set "Minute fee after x min" to the transition point
- Example: "free 30 min then 0.10 €/min" →
- `Minute Fee = 0.10` (excl. VAT)
- `Minute fee after x min = 30`
- Example: "puis 0,15 €/min après 30 minutes" →
- `Minute Fee = 0.15` (excl. VAT)
- `Minute fee after x min = 30`
#### Delayed start patterns
- "après 60 minutes: 0,05 €/min" (after 60 minutes)
- "na 45 minuten: 0,10 €/min" (after 45 minutes)
- "after 1 hour: 0.15 €/min"
**Normalization**: Same as sequential - fee starts after delay
#### Block-based charging
- "3 € par tranche de 15 minutes" (per 15-minute block)
- "2 € per begonnen kwartier" (per started quarter hour)
- "5 € per aangefangen uur" (per started hour)
**Normalization**: Convert to per-minute rate
- 15-minute blocks: price / 15
- Hourly blocks: price / 60
- "Started/begun" blocks: Same conversion, note in description if needed
#### Block pricing with duration (NEW PATTERN)
- "X € les Y minutes" (X euros for Y minutes)
- "X € pour Y minutes" (X euros for Y minutes)
- "X € pendant Y minutes" (X euros during Y minutes)
**Normalization**:
- Treat X € (excl. VAT) as Start Fee
- Set "Minute fee after x min" to Y
- Find the next per-minute price in the text and record it in Minute Fee
- Example: "5 € les 30 minutes puis 0.15 €/min" → Start Fee=5.00, Minute Fee=0.15, Minute fee after x min=30
All fees are optional. Include only when representable.
- **Session / transaction fee**
Phrases: "per session", "per transaction", "unlock", "forfait initial", "X € à la connexion", "forfait", "par session"
→ Map to Start Fee column
→ "X € à la connexion" = Start Fee with 0 minutes
- **Per-minute while charging after grace**
Phrases: "X €/min after Y min", "X €/15min after Y", "after Y minutes then X €/min"
→ `type: "minute"`, `price: cents`, `applyAfterMinutes: Y`
If `€/15min`, convert to per-minute before storing.
- **Free minute window**
Phrases: "0 €/min limited for 45 min", "first 30 minutes free while charging", "gratuite pendant 30 minutes", "première heure gratuite"
→ **IMPORTANT**: Do NOT record a 0 € Start Fee
- Set Minute Fee to the rate after the free period
- Set "Minute fee after x min" to the length of the free period
- Example: "gratuite pendant 30 minutes puis 0.15 €/min" → Minute Fee=0.15, Minute fee after x min=30
- **Idle fee when fully charged**
Phrases: "when fully charged", "pleine charge", "idle fee"
→ `type: "idle"`, `price: cents`, add `endAtFullyCharged: true`
If grace period exists, add `applyAfterMinutes`.
- **Caps**
If a cap is explicitly provided for a **specific fee**, set that fee's `feePriceMax`.
Do **not** invent global session caps if unspecified.
- **Simultaneous kWh + min**
Phrases: "€/kWh + €/min"
→ `masterPrice` = per-kWh; add a `minute` fee for the minute rate with `applyAfterMinutes: 0` unless a free window is specified.
---
## 5) Special constructs and decisions
### 5.1 Multiple sequential fees
When tariffs have multiple time-based transitions:
**Example patterns**:
- "0-30 min: free, 30-60 min: 0.05 €/min, 60+ min: 0.10 €/min"
- "First hour 0.03 €/min, after 1h: 0.06 €/min, after 2h: 0.12 €/min"
**Current limitation**: Can only represent ONE transition point
- Choose the FIRST transition after free/base period
- Example: For "free 30 min, then 0.05 €/min until 60 min, then 0.10 €/min"
- Use: `Minute Fee = 0.05`, `Minute fee after x min = 30`
- Skip the 60+ min rate with reason "multiple rate transitions not supported"
### 5.2 Started vs exact time blocks
- "per begonnen kwartier" (per started quarter)
- "par tranche de 15 minutes entamée" (per started 15-min block)
- These mean charging for ANY portion of the block
- Still normalize to per-minute for consistency
- **Minimum kWh billed**
Example: "0.50 €/kWh, minimum 5 kWh billed"
→ **Ignore the minimum**. Use only the per-kWh as `masterPrice`. No extra fees for the minimum.
- **Minimum time billed**
Example: "Minimum 15 minutes billed"
→ **Not representable**. Add to `skipped` with reason `"minimum time not supported"`.
- **Energy thresholds within a session**
Example: "First 10 kWh at A, then 0.40 €/kWh"
→ **Not representable**. Add to `skipped` with reason `"energy thresholds not supported"`.
- **Time-of-day / calendar windows**
Example: "08:00–20:00 at A, 20:00–08:00 at B"
→ **Not representable**. Add to `skipped` with reason `"time windows not supported"`.
- **Session caps (global)**
Example: "Max 20 € per session"
→ **Not representable**. Add to `skipped` with reason `"session cap not supported"`.
- **Lone numbers without units**
Example: "2 €" with no unit context
→ If clearly labeled as "per session/transaction", treat as `start_fixed`. Otherwise `skipped` with reason `"unit missing"`.
---
## 6) Field population details
- `masterPrice.description`:
`"Base per kWh"` or `"Base per minute"` plus VAT note:
- `", incl. VAT"` if `vat: true`
- `", excl. VAT"` if `vat: false`
Add clarifiers like `"(incl. VAT presumed)"` if VAT inferred.
- `masterPrice.priceMin` / `priceMax`:
- If only a single base price exists, set equal to `price`.
- If the source declares explicit min/max bounds for the **base price**, mirror them; otherwise do not invent bounds.
- `fees[].applyAfterMinutes`:
- Include when a grace or threshold in minutes exists.
- `0` allowed to model "free from start" windows.
- `fees[].endAtFullyCharged`:
- Include and set `true` for idle fees. Omit otherwise.
- `currency`:
- Use detected currency for the row/sheet.
- Do not mix currencies in one group.
---
## 7) ExternalId and de-duplication
- Build `externalId` as:
`<issuer>-<tariff-slug>-<band>-<type>-<currency>`
- `issuer` comes from the document origin (e.g., `freshmile`, `totalenergies`). If unknown, use a stable slug from filename or sheet.
- `tariff-slug` preference order: explicit code → tariff name → fallback site/name.
- Ensure deterministic slugs: lowercase, ASCII, non-alphanumeric collapsed to `-`, trimmed.
- If duplicate rows would yield identical `externalId`, keep the **first** and skip others with reason `"duplicate externalId"`.
---
## 8) Processing order
1. Identify ALL relevant sheets/pages (not just the first).
2. **IMPORTANT**: Process ALL rows and ALL dimensions in the input file. Do not stop early or truncate processing.
- Continue parsing until the last row is processed
- If the file contains 100+ entries, process all 100+ entries
- Do not limit output to 20-25 entries
- If processing is taking too long, continue anyway - completeness is critical
3. For complex structures:
- **Multiple sheets**: Process each sheet that contains tariffs
- **Adjacent columns**: Create separate entries for each price column
- **Matrix layouts**: Process each cell that contains a valid price
- **Grouped sections**: Process all groups within a sheet
4. For each candidate row/cell (process EVERY valid price point):
- Extract raw pricing text from all relevant columns
- **When both "current" and "new" pricing exist, ALWAYS use "new"**
- Detect currency and VAT context
- Parse master unit and amount
- Map any fees per rules above
- Determine band, issuer, tariff name/code
- Include dimensional context (sheet name, column header, group) in naming
- Compose `externalId` and `name` that reflects all dimensions
- **If no numeric prices found (no kWh, minute, idle, or start fees), skip with reason "no numeric price found"**
- Otherwise append to `priceGroups`
5. Only after processing ALL sheets and ALL dimensions, compute `summary`.
6. The `summary.total` must reflect the actual total number of tariff entries found across all sheets and dimensions.
---
## 9) Common pricing patterns library
This section contains real-world pricing patterns and how to map them to XLSX columns.
### 9.1) Basic price patterns
**Pure kWh pricing**
- `0.35 €/kWh` → kWh Price=0.35 (or 0.2917 if incl. VAT)
- `35 ct/kWh` → kWh Price=0.35 (convert cents)
- `€ 0,42 par kWh` → kWh Price=0.42
- `0.50 €/kWh entamé` → kWh Price=0.50
**Pure minute pricing**
- `0.08 €/min` → Minute Fee=0.08
- `8 ct/min` → Minute Fee=0.08 (convert cents)
- `0,05 € la minute` → Minute Fee=0.05
**Block time pricing**
- `3 €/15min` → Minute Fee=0.20 (3/15)
- `6 € le 1/4h` → Minute Fee=0.40 (6/15)
- `12 €/hour` → Minute Fee=0.20 (12/60)
- `5 € par tranche de 15 minutes` → Minute Fee=0.3333
**Combined pricing**
- `0.45 €/kWh + 0.05 €/min` → kWh Price=0.45, Minute Fee=0.05, Minute fee after x min=0
- `0.35 €/kWh et 0.02 €/min` → kWh Price=0.35, Minute Fee=0.02
### 9.2) Conditional and delayed fees
**Minute fee after delay**
- `0.35 €/kWh, 0.10 €/min after 30 min` → kWh Price=0.35, Minute Fee=0.10, Minute fee after x min=30
- `0 €/min for 45 min, then 0.20 €/min` → Minute Fee=0.20, Minute fee after x min=45
- `gratuit pendant 30 min, puis 0,10 €/minute` → Minute Fee=0.10, Minute fee after x min=30
- `après 60 minutes: 0,05 €/min` → Minute Fee=0.05, Minute fee after x min=60
**Session/start fees**
- `1.50 € per session + 0.40 €/kWh` → Start Fee=1.50, kWh Price=0.40
- `2 € à la connexion` → Start Fee=2.00
- `forfait de démarrage 3 €` → Start Fee=3.00
**Idle fees**
- `0.35 €/kWh, idle: 0.10 €/min after 10 min` → Idle Fee=0.10, Idle Fee after x min=10
- `0.15 €/min when fully charged` → Idle Fee=0.15
### 9.3) Special cases and non-representable patterns
**Free tariffs (MUST include, never skip)**
- `gratuit` → kWh Price=0
- `free charging` → kWh Price=0
- `0 €/kWh` → kWh Price=0
**Free periods (handle differently from free tariffs)**
- "gratuite pendant X minutes" (free for X minutes) → NOT a free tariff
- "première heure gratuite" (first hour free) → NOT a free tariff
- These indicate delayed charging, not completely free charging
- Set "Minute fee after x min" to X and record the subsequent rate
**Not representable - skip with reason**
- `Minimum 15 minutes billed` → Skip: "minimum time not supported"
- `First 10 kWh at 0.30 €/kWh, then 0.40 €/kWh` → Skip: "energy thresholds not supported"
- `0.25 €/kWh 08:00–20:00, 0.35 €/kWh 20:00–08:00` → Skip: "time windows not supported"
- `Max 20 € per session` → Skip: "session cap not supported"
- `2 €` (no unit) → Skip: "unit missing" (unless clearly session fee)
- Subscription models → Skip: "subscription model not supported"
- Membership discounts → Skip: "membership pricing not supported"
### 9.4) Multilingual patterns
**French**
- `par kWh` = per kWh
- `par minute` = per minute
- `le quart d'heure` = per 15 minutes
- `après X minutes` = after X minutes
- `puis` = then
- `entamé` = started (round up)
**Dutch**
- `per kWh` = per kWh
- `per minuut` = per minute
- `per kwartier` = per 15 minutes
- `na X minuten` = after X minutes
- `daarna` = then
- `gratis` = free
- `per geleverde kWh` = per delivered kWh
**German**
- `pro kWh` = per kWh
- `pro Minute` = per minute
- `nach X Minuten` = after X minutes
- `kostenlos` = free
---
## 10) Examples of mapping to XLSX columns
- **Pure kWh (euros)**
Text: `0.35 €/kWh` (incl. VAT) for tariff "Example Tariff" with ID "FREX123" (FR)
→ Row: Tariff="Example Tariff", Tariff ID="FREX123", kWh Price=0.2917, Power Type="AC", CountryCode="FR"
(Note: 0.35 / 1.20 = 0.2917 excl. VAT for FR)
- **Pure kWh (cents)**
Text: `35 ct/kWh` (incl. VAT, FR) → kWh Price=0.2917
Text: `100 cents/kWh` (incl. VAT, DE) → kWh Price=0.8403
- **Pure minute**
Text: `0,08 €/min` (incl. VAT, FR) → Minute Fee=0.0667
Text: `3 €/15min` (incl. VAT, FR) → Minute Fee=0.1667
- **kWh + minute fees**
Text: `0.45 €/kWh + 0.05 €/min` (incl. VAT, FR)
→ kWh Price=0.375, Minute Fee=0.0417, Minute fee after x min=0
- **Minute fee with delay**
Text: `0.35 €/kWh, 0.10 €/min after 30 min` (incl. VAT, FR)
→ kWh Price=0.2917, Minute Fee=0.0833, Minute fee after x min=30
- **Session fee**
Text: `1.50 € per session + 0.40 €/kWh` (incl. VAT, FR)
→ kWh Price=0.3333, Start Fee=1.25
- **Idle fee**
Text: `0.35 €/kWh, idle: 0.10 €/min after 10 min` (incl. VAT, FR)
→ kWh Price=0.2917, Idle Fee=0.0833, Idle Fee after x min=10
- **Power bands**
Tariff "Fast Charge" with bands:
- ≤22kW: 0.35 €/kWh → min. Power=0, max. Power=22, Power Type="AC", kWh Price=0.35
- 22-50kW: 0.45 €/kWh → min. Power=22, max. Power=50, Power Type="DC", kWh Price=0.45
- >50kW: 0.55 €/kWh → min. Power=50, max. Power=350, Power Type="DC", kWh Price=0.55
- **Skipped item**
Text: `First 10 kWh 0.30 €/kWh, then 0.40 €/kWh`
→ Goes to "Skipped" sheet with reason="energy thresholds not supported"
---
## 10) Validation and consistency checks
**CRITICAL**: Before generating the XLSX file, you MUST perform these validations:
### 10.1) Data integrity checks
- Number of data rows (excluding header and summary) equals processed count
- Number of rows in "Skipped" sheet equals skipped count
- Total items processed = data rows + skipped rows
- Summary row shows correct totals
### 10.2) Column validation
- All money fields are numbers (decimals allowed, where 1.00 = 1 EUR)
- Power values are integers in kW
- Power Type is either "AC" or "DC" or **empty** if not determinable
- Minute fee after x min and Idle Fee after x min are integers (minutes)
- Fallback is either "TRUE" or "FALSE"
- Empty cells for non-applicable values (not 0 or null)
### 10.3) Template compliance checks
- **MANDATORY**: Output file has EXACTLY 14 columns
- **MANDATORY**: Columns are in the EXACT order listed at the beginning of this document
- **MANDATORY**: No additional columns have been added
- **MANDATORY**: No columns have been removed or renamed
- **MANDATORY**: Column names match EXACTLY (case-sensitive):
1. Tariff
2. Tariff ID
3. min. Power
4. max. Power
5. Power Type
6. CountryCode
7. PartyID
8. kWh Price
9. Minute Fee
10. Minute fee after x min
11. Idle Fee
12. Idle Fee after x min
13. Start Fee
14. Fallback
### 10.4) Business logic checks
- Each row has a unique Tariff ID
- min. Power < max. Power (when both present)
- Country codes follow ISO 3166-1 alpha-2 format when provided
- All numeric price values are WITHOUT VAT
---
## 11) Output handling
**TEMPLATE COMPLIANCE IS CRITICAL**:
- **STEP 1**: Create DataFrame with EXACTLY these 14 columns in order
- **STEP 2**: Populate data maintaining the exact structure
- **STEP 3**: Validate before saving that columns match EXACTLY
**FORBIDDEN ACTIONS**:
- ❌ Adding any columns beyond the 14 specified
- ❌ Removing columns
- ❌ Renaming columns
- ❌ Changing column order
- ❌ Creating summary rows
**REMEMBER**:
- All numeric price fields are WITHOUT VAT
- The XLSX file must have two sheets: "Tariffs" (main data) and "Skipped" (unprocessable items)
- **IMPORTANT: Do NOT add a SUMMARY row to the Tariffs sheet**
- Each row in the Tariffs sheet must be an actual tariff entry
- Save the populated file as `output.xlsx` in the current working directory
- After saving, provide the full file path where the output was saved
- Display summary statistics (total, processed, skipped, success rate) in the console/chat only
- If possible, provide a download link or mechanism for the user to retrieve the file
- For web environments: Create a downloadable link to the output.xlsx file
- For CLI environments: Save to the current directory and inform the user of the location
## 12) Parser generation
If no specific parser is available for the input format:
- **ALWAYS create a custom parser** - build your own tailored solution
- **DO NOT accept less than 95% success rate** - if initial parsing is below 95%, analyze failures and improve the parser
- **Target ≥99% success rate** based on learnings from similar operators
- Build a custom Python parser on the fly that:
- Creates output structure with the exact 14 columns specified
- Reads the specific input format
- **Processes EVERY row in the input file without truncation**
- Implements batch processing if needed to handle large files
- Uses iterative processing to avoid memory issues with large datasets
- Applies all normalization rules from this prompt
- Creates two sheets: "Tariffs" and "Skipped"
- Saves the output as `output.xlsx` in the same directory as the input file or current working directory
- Returns the full path to the saved file
- In Python:
```python
# Define the EXACT column structure
COLUMNS = [
"Tariff", "Tariff ID", "min. Power", "max. Power", "Power Type",
"CountryCode", "PartyID", "kWh Price", "Minute Fee",
"Minute fee after x min", "Idle Fee", "Idle Fee after x min",
"Start Fee", "Fallback"
]
# Create output dataframe with EXACTLY these columns
output_df = pd.DataFrame(parsed_data, columns=COLUMNS)
# VALIDATION: Verify columns match exactly
assert list(output_df.columns) == COLUMNS, "Output columns do not match required structure!"
assert len(output_df.columns) == 14, "Output must have exactly 14 columns!"
# Save with proper structure
with pd.ExcelWriter('output.xlsx', engine='xlsxwriter') as writer:
output_df.to_excel(writer, sheet_name='Tariffs', index=False)
skipped_df.to_excel(writer, sheet_name='Skipped', index=False)
```
- Includes progress tracking for files with many entries
- **If success rate < 95%**: Analyze skipped items, identify patterns, and enhance parser logic
- **Keep iterating until achieving ≥ 95% success rate**
### 12.1) Completeness requirements
- The parser MUST process the entire input file
- Do not implement arbitrary limits on the number of entries processed
- If the input has 1000 tariffs, the output should have 1000 entries (processed + skipped)
- Use streaming or chunked processing for very large files if necessary
### 12.2) Automatic VAT handling
The parser MUST automatically:
- Detect or infer the country (from issuer, location names, etc.)
- Apply the correct VAT rate from the comprehensive list in section 3.2
- Convert ALL numeric values to EXCLUDE VAT
- Never ask the user about VAT - make intelligent assumptions
- If prices are already incl. VAT: calculate excl. VAT
- If prices are excl. VAT: use as-is
- Default to incl. VAT for consumer-facing tariffs
### 12.3) Parser robustness requirements
Based on operator-specific learnings:
- **Column flexibility**: Never hardcode column names - use pattern matching
- **Language support**: Include FR, NL, DE, EN patterns for all price types
- **Free tariffs**: Always include, never skip (price = 0)
- **Sequential fees**: Handle "then/puis/daarna" patterns correctly
- **Completeness**: Process ENTIRE file, no early termination
- **Success tracking**: Log specific failure reasons for improvement
## 13) Fallback pricing
**MANDATORY**: As part of the parsing process, you MUST:
1. Determine the primary currency used in the tariffs (e.g., EUR, USD, GBP)
2. After processing all tariffs but BEFORE creating the final output file, ask the user for fallback prices:
- "What should be the fallback price per kWh for AC charging in [CURRENCY]?"
- "What should be the fallback price per kWh for DC charging in [CURRENCY]?"
3. Add two fallback tariff rows at the END of the Tariffs sheet:
- AC Fallback: Tariff="Fallback AC", Tariff ID="FALLBACK-AC", Power Type="AC", Fallback="TRUE", kWh Price=[user provided AC price excl. VAT]
- DC Fallback: Tariff="Fallback DC", Tariff ID="FALLBACK-DC", Power Type="DC", Fallback="TRUE", kWh Price=[user provided DC price excl. VAT]
4. Leave all other fields empty for fallback entries except:
- CountryCode if consistent across all tariffs
- PartyID if consistent across all tariffs
5. Ensure fallback prices are converted to EXCLUDE VAT if user provides incl. VAT prices
6. The Fallback column should be "FALSE" for all regular tariffs and "TRUE" only for these two fallback rows
**IMPORTANT**: Do NOT generate the output file until you have received the fallback prices from the user
## 14) Final instruction
1. **MANDATORY FIRST STEP**: Create DataFrame with exactly these 14 columns in order:
- Tariff, Tariff ID, min. Power, max. Power, Power Type, CountryCode, PartyID, kWh Price, Minute Fee, Minute fee after x min, Idle Fee, Idle Fee after x min, Start Fee, Fallback
2. Parse all tariffs from the input file
3. **MANDATORY**: Ask user for AC and DC fallback prices before proceeding
4. Add the two fallback rows with Fallback="TRUE" at the end
5. Generate the XLSX file by populating ONLY these columns
6. **MANDATORY VALIDATION**: Before saving, verify output has EXACTLY the same columns
7. Save it as `output.xlsx` in an accessible location
8. Include both "Tariffs" and "Skipped" sheets
9. **DO NOT add a summary row** - keep all rows as actual tariff data
10. Provide clear instructions on how to access/download the file
8. If in a web environment, create a downloadable link
9. If in a CLI environment, save locally and provide the full file path
10. Show a brief summary of processing results IN CHAT ONLY (total processed, skipped, success rate)
11. **CRITICAL**: Success rate MUST be ≥ 95%. If below, improve parser and reprocess
12. Always create a custom parser tailored to the specific input format
13. **REMEMBER: Summary statistics go in chat/console output ONLY, never in the XLSX file**
## 15) Download link requirement
After writing `output.xlsx`, always provide a direct clickable link in the form:
[Download output.xlsx](sandbox:/mnt/data/output.xlsx)
Do this in the same response that contains the parsing summary. Do not delay or wait for another user message.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment