Skip to content

Instantly share code, notes, and snippets.

@vingkan
Last active February 24, 2021 03:03
Show Gist options
  • Save vingkan/b113c0f53b6f82a55f0b39fea7f11188 to your computer and use it in GitHub Desktop.
Save vingkan/b113c0f53b6f82a55f0b39fea7f11188 to your computer and use it in GitHub Desktop.
Solution queries for two truths and a lie with the rideshare dataset.
-- Most rides start and end in the Near North Side community area.
SELECT
pickup_community_area,
COUNT(*) as trips
GROUP BY pickup_community_area
ORDER BY trips DESC
-- On average, riders picked up in the Lincoln Square community area tip better than riders picked up in the Loop.
SELECT
pickup_community_area,
AVG(tip) as avg_tip
GROUP BY pickup_community_area
ORDER BY avg_tip DESC
-- If you order a shared ride, you have a 30% or better chance of having the ride all to yourself.
SELECT
1 - (SUM(
CASE(trips_pooled > 1, 1, TRUE, 0)
) / COUNT(*)) AS solo_trip_rate
WHERE shared_trip_authorized
-- Bonus: probability of having a shared ride to yourself, given dropoff, pickup, and day of week
SELECT
1 - SUM(CASE(trips_pooled > 1, 1, TRUE, 0))
/ Count(*) AS prob,
pickup_community_area AS a_pickup,
dropoff_community_area AS b_dropoff,
Count(*) AS total,
date_extract_dow(trip_start_timestamp) AS dow
WHERE shared_trip_authorized = true
GROUP BY
pickup_community_area,
dropoff_community_area,
dow
ORDER BY total DESC, prob DESC
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment