Monte Carlo simulations are typically used to get estimates of results, when the actual problem is too vast to be solved in entirety (or in a reasonable amount of time).
There are 3 examples, each closer to the real world version than the last. Example 1 is the best 😄
Let's take an example: Palash wants to get an estimate of the total number of potholes in London. Now, a seamingly obvious way to do this would be to systematically go through each and every street in London and count the number of potholes in each street.
However, let's say that there are a million streets in London. You can't expect Palash to go through each and every street and count the number of potholes, he'll be dead before he's done with even a fraction of the streets.
So what does Palash do? Being the wise man he is, he decides to take a sample of the streets in London. He randomly selects 100 streets and counts the number of potholes in each of them. He then takes the average of the number of potholes in these 100 streets and multiplies it by the total number of streets in London to get an estimate of the total number of potholes.
This is a Monte Carlo simulation.
You can have additional factors in your simulation,such as:
- the length of the street
- the width of the street,
- the total number of cars you saw on the street
- the time of the day you saw the street
to enhance the accuracy of your simulation.
But the whole idea is to take a sample of the population and use it to estimate the characteristics of the population.
Estimating the average commute time for everyone in New York City is a massive task. It's impractical to ask every single person about their commute time. The city is huge, and there are millions of people with varying distances and modes of transport.
How would we sample the population?
Instead of surveying everyone, we decide to randomly select a sample of, say, 5000 people from different parts of the city.
What data are we collecting?
For each person, we collect the following data:
- The distance they travel to work
- The mode of transport they use
- The suburbs/neighborhoods they commute to and from
- The times at which they make their commute
What are some seasonal or time-based factors that we should consider?
- The time of the year (summer, winter, spring, fall)
- The time of the day (rush hour, mid-day, late night)
- Weather conditions (rain, snow, sunny)
- Holidays (Christmas, Thanksgiving, New Year's Eve, Black Friday)
Calculating the average commute time
This isn't trivial simply because of the nature of the data we're dealing with. But quite doable with some Excel and statistics.
Extrapoloating the results to the entire population
This is the most important part of the simulation, where we "apply" our results to the entire population.
This step typically includes some A/B testing, where we test our results against a control group.
This is a very real-world example of how Monte Carlo simulations are used.
Problem:
In a large urban area, structural engineers want to assess the long-term effects of environmental stress (like wind, earthquakes, heavy traffic vibrations, etc.) on buildings. It's impractical to instrument every building with sensors due to cost and logistical constraints.
Sampling:
The engineers select a sample of buildings, say 100, of various ages, designs, and in different locations throughout the city.
Photographic Data Collection:
They set up time-lapse cameras or periodically take high-resolution photographs of these buildings over an extended period. These photographs capture subtle changes in the buildings' structures that might indicate stress, like cracks, tilting, or bending.
Additional Data Gathering:
Alongside photographs, they might also collect data on:
- Environmental conditions (temperature, humidity, wind speed).
- Seismic activity in the area.
- Traffic density around the buildings (for vibration analysis).
- Historical data on the buildings’ maintenance and renovations.
Image Analysis for Stress Indicators:
Using image processing techniques, the engineers analyze the photographs for signs of structural stress over time. They look for patterns or changes that might indicate weakening or potential failure points in the buildings.
The actual Monte Carlo Simulation:
They use this data to run simulations for each building in the sample. The simulations incorporate the observed data and additional environmental factors to estimate each building's stress level and potential lifespan.
Extrapolation and Predictive Analysis:
From these simulations, the engineers develop a model to estimate the structural integrity and lifespan of similar buildings across the city. For example, if the data shows that buildings of a certain age or design are more prone to stress under specific conditions, they can predict which buildings in the city might need closer monitoring or reinforcement.
If you'd like to watch a video on this topic, you can watch it here:
If you simply want to read through the lecture notes, you can find them here: