Given the following database schema:
| page_metrics |
+------------------------------------------------------+
| id | page_id | date | fans | comments | clicks |
+----+---------+------------+------+----------+--------+
| 11 | 123 | 2012-08-01 | 1750 | 23 | 103 |
| 12 | 456 | 2012-08-01 | 58 | 8 | 15 |
| 13 | 123 | 2012-08-02 | 1755 | 48 | 421 |
| 14 | 456 | 2012-08-02 | 61 | 6 | 25 |
| ... |
+----+---------+------------+------+----------+--------+
Write a function that takes the following parameters:
page_id
, metric
, start_time
, end_time
and returns data in the following format:
[
{ 'date': '2012-08-01', 'fans': 1750 },
{ 'date': '2012-08-01', 'fans': 1755 }
]
You may assume that you have a simple DB library that provides some basic functionality. For example: DB[:table_name].select('SELECT ...your SQL statement')
could run a SELECT statement and return an array of hashes containing the matching data.
What if we wanted the metric
parameter in the previous function to accept an array of metrics?
What if we wanted the page_id
parameter to also accept an array? In this case, we would still only return a single date series, but the values would be the sum of all the pages indicated.
[
{ 'date': '2012-08-01', 'fans': 1808 },
{ 'date': '2012-08-01', 'fans': 1816 }
]
Now suppose we started tracking metrics that had multiple values per day. For example, on row 11 above, we would know that 1500 of those fans were from the United States, 200 were from Canada, and 50 were from Mexico.
We would want to be able to maintain performant queries to get the total number of likes. We would also want to be able to query for fans_by_country
and receive a broken-down list. The response might look as follows:
[
{
'date': '2012-08-01',
'fans': 1750,
'comments': 23,
'fans_by_country': {
'United States': 1500,
'Canada': 200,
'Mexico': 50
}
},
{
'date': '2012-08-01',
'fans': 1755,
'comments': 48,
'fans_by_country': {
'United States': 1502,
'Canada': 201,
'Mexico': 52
}
}
]
How would you design a schema to support that breakdown?
How would you add to your previous function so that it could return data as specified?
Imagine this database at full scale:
- 10,000 pages
- A year worth of data
- Dozens of metrics
- All broken down by country
What sort of changes might be warranted in your database schema design? What optimizations could be made to your functions?