Project Context: Data Analytics Dashboards

This project builds data analytics dashboards using Python and Plotly Dash.

1. Architecture Principles

Core Design Philosophy

Loose Coupling: Components interact through well-defined interfaces
High Cohesion: Related functionality stays together; each function has a single clear purpose
Composability: Basic visualizations combine to create complex dashboards
ACAP-ADAN: As Common As Possible, As Differentiated As Necessary

Design Evolution

Start simple, let solutions evolve
Add complexity only when real needs emerge
Postpone decisions to the last responsible moment

2. Component Hierarchy

Organized from first principles up the ladder of abstraction:

Level 1: Basic Visualizations

Individual chart functions representing atomic visualization types:

create_bar_chart()
create_line_chart()
create_scatter_chart()
etc.

Each basic visualization is its own function with a single clear purpose.

Level 2: Composite Visualizations

Combinations of basic visualizations:

KPI cards with sparklines
Dashboard panels combining multiple chart types
Complex analytical displays

Composite visualizations accept basic visualizations as parameters or combine their outputs.

Level 3: Layout and Navigation

Higher-level abstractions (future development):

Page structure and routing
Callbacks for interactivity
Application-level orchestration

3. Code Style

Standards

Follow PEP 8 strictly
Use type hints for all function parameters and returns
Maximum line length: 88 characters (Black formatter default)

Function Organization

Each visualization type is its own function
Keep functions focused on a single responsibility
Group related utilities in cohesive modules

4. Data and Interface Specification

Data Format Requirements

Required: All data must be in Tidy (long-form) format as Pandas DataFrames

One observation per row
Variables in columns
Each value in its own cell

If wide-form data is passed, raise a clear error with link to documentation.

Visualization Function Interface

Standard Function Signature Pattern

def create_bar_chart(
    data: pd.DataFrame,
    x: str,
    y: Union[str, List[str]],
    return_type: str = 'figure',
    color_scheme: str = None,
    height: Union[int, str] = None,
    width: Union[int, str] = None,
    **kwargs
) -> Union[go.Figure, dcc.Graph]:
    """
    Create a bar chart visualization.
    
    Parameters
    ----------
    data : pd.DataFrame
        Tidy format dataframe with one observation per row
    x : str
        Column name for x-axis
    y : str or List[str]
        Column name(s) for series to plot
    return_type : str, default 'figure'
        Return 'figure' (Plotly object) or 'component' (Dash dcc.Graph)
    color_scheme : str, optional
        Named color scheme from config.py
    height : int or str, optional
        Chart height in pixels or percentage
    width : int or str, optional
        Chart width in pixels or percentage
    
    Returns
    -------
    go.Figure or dcc.Graph
        Depending on return_type parameter
    """

Required Parameters

Every visualization function must accept:

data: Pandas DataFrame in tidy format
x: Single column name for x-axis (or equivalent primary dimension)
y: Single column name or list of column names for series

Optional Parameters (with sensible defaults)

return_type: 'figure' (Plotly figure object) or 'component' (Dash dcc.Graph)
- Default: 'figure'
- Allows use in notebooks (figure) or production dashboards (component)
color_scheme: Named color scheme from config.py
- Default: Auto-selected based on data type (sequential/divergent/qualitative)
height: Chart height in pixels or percentage
width: Chart width in pixels or percentage

Composability Implementation

Small multiples: Implemented as parameters within individual graph functions
- small_multiples: bool = False
- columns: int = 2 (for layout)
- shared_axes: bool = True
Return values: Basic visualizations return objects (figures or components) that composite functions can accept and combine
Parameter passing: Composite visualizations accept basic visualizations as parameters or combine their outputs programmatically

5. Example Code

import pandas as pd
import plotly.graph_objects as go
from dash import dcc
from typing import Union, List
from config import color_scheme_qual_category_8

def create_bar_chart(
    data: pd.DataFrame,
    x: str,
    y: Union[str, List[str]],
    return_type: str = 'figure',
    color_scheme: str = None,
    height: Union[int, str] = None,
    width: Union[int, str] = None,
) -> Union[go.Figure, dcc.Graph]:
    """Create a bar chart from tidy data."""
    
    # Validate data format
    if not _is_tidy_format(data):
        raise ValueError(
            "Data must be in tidy format. "
            "See: docs/data_format_guide.md"
        )
    
    # Validate required columns exist
    if x not in data.columns:
        raise ValueError(f"Column '{x}' not found in data")
    
    # Auto-select color scheme if not provided
    if color_scheme is None:
        color_scheme = _infer_color_scheme(data, y)
    
    # Create the figure
    fig = go.Figure()
    
    # Add traces (simplified example)
    y_cols = [y] if isinstance(y, str) else y
    for col in y_cols:
        fig.add_trace(go.Bar(x=data[x], y=data[col], name=col))
    
    # Apply styling
    fig.update_layout(
        height=height,
        width=width,
        # Additional layout configuration
    )
    
    # Return appropriate type
    if return_type == 'figure':
        return fig
    elif return_type == 'component':
        return dcc.Graph(figure=fig)
    else:
        raise ValueError(
            f"return_type must be 'figure' or 'component', got '{return_type}'"
        )

def _is_tidy_format(data: pd.DataFrame) -> bool:
    """Helper to validate tidy format (implementation details)."""
    # Implementation logic here
    pass

def _infer_color_scheme(data: pd.DataFrame, y: Union[str, List[str]]) -> str:
    """Helper to infer appropriate color scheme from data."""
    # Implementation logic here
    pass

6. Configuration Management

config.py Structure

Create config.py for shared settings that multiple visualization functions need.

Currently includes:

Color scheme definitions

Evolution approach: Keep configuration minimal initially. Add settings only when:

Multiple functions need the same value
The value should be consistent across the project
Changing it in one place should affect all uses

Color Scheme Naming Convention

Format: color_scheme_{type}_{name}_{max_colors}

Type abbreviations:

seq: Sequential (for ordered data, e.g., low to high)
cont: Continuous (for continuous numerical scales)
div: Divergent (for data with meaningful center, e.g., profit/loss)
qual: Qualitative (for categorical data with no inherent order)

Examples:

# In config.py
color_scheme_seq_blues_9 = ['#f7fbff', '#deebf7', '#c6dbef', ...]
color_scheme_div_redblue_11 = ['#67001f', '#b2182b', ..., '#053061']
color_scheme_qual_category_8 = ['#1f77b4', '#ff7f0e', '#2ca02c', ...]

Defaults Location

Place defaults as close to the code using them as possible:

Shared across functions: In config.py (e.g., color schemes)
Function-specific: In function signature defaults (e.g., return_type='figure')

This maintains high cohesion while allowing shared configuration where needed.

7. Error Handling

Error Handling Principles

All exceptions must follow this three-part pattern:

Raise an appropriate error type (ValueError, TypeError, etc.)
Provide a clear, short description of what went wrong
Include a URL/path to detailed documentation for resolution

Examples

Invalid data format:

if not is_tidy_format(data):
    raise ValueError(
        "Data must be in tidy format. "
        "See: docs/data_format_guide.md"
    )

Color scheme capacity exceeded:

if n_categories > max_colors:
    raise ValueError(
        f"Color scheme '{color_scheme}' supports max {max_colors} colors, "
        f"but data has {n_categories} categories. "
        f"See: docs/color_schemes.md for alternatives"
    )

Missing required column:

if x not in data.columns:
    raise ValueError(
        f"Column '{x}' not found in data. "
        f"Available columns: {list(data.columns)}"
    )

Invalid parameter value:

if return_type not in ['figure', 'component']:
    raise ValueError(
        f"return_type must be 'figure' or 'component', got '{return_type}'"
    )

8. Documentation Standards

What to Document

Document only what cannot be intuited from reading the code:

Why a design decision was made (not what the code does)
Non-obvious parameter constraints or relationships
Expected data structures when not clear from type hints
Links to external resources for complex concepts
Business logic reasoning that isn't self-evident

What NOT to Document

Avoid documenting:

What is already clear from function/variable names
What type hints already express
Simple implementations that speak for themselves
Obvious parameter descriptions

Docstring Format

Use Google style docstrings consistently across the project.

Example (Google style):

def create_bar_chart(data, x, y, return_type='figure'):
    """Create a bar chart visualization from tidy data.
    
    Uses ACAP-ADAN principle: common bar chart logic with differentiation
    through parameters. Automatically selects appropriate color scheme
    based on data type if not specified.
    
    Args:
        data: Tidy format dataframe with one observation per row.
        x: Column name for categorical x-axis.
        y: Column name(s) for numeric values to plot.
        return_type: Whether to return 'figure' (Plotly object) or 
            'component' (Dash dcc.Graph). Defaults to 'figure'.
    
    Returns:
        Plotly figure object or Dash component based on return_type.
        
    Raises:
        ValueError: If data is not in tidy format or required columns missing.
    
    See Also:
        docs/data_format_guide.md for explanation of tidy data format.
    """

Development Workflow

When creating new visualizations:

Start with the simplest working version
Add parameters only when needed
Ensure error messages are helpful with documentation links
Test with edge cases (empty data, single row, max colors exceeded)
Update this Claude.md if new patterns emerge

Future Considerations

Areas to develop as needs become clear:

Testing standards and patterns
Callback patterns for interactivity
File and folder structure conventions
Performance optimization guidelines

This document evolves with the project. Update it as patterns emerge and solidify.

MrSteve2/Claude.md