Skip to content

Instantly share code, notes, and snippets.

@jykim16
Created January 17, 2020 10:34
Show Gist options
  • Save jykim16/9f684e9beae389ce309ad353a8c07fc9 to your computer and use it in GitHub Desktop.
Save jykim16/9f684e9beae389ce309ad353a8c07fc9 to your computer and use it in GitHub Desktop.
import pandas as pd
df = pd.read_csv("sample_data_3.csv")
df['ts'] = pd.to_datetime(df['ts'])
# Consider only the rows with country_id = "BDV" (there are 844 such rows).
# For each site_id, we can compute the number of unique user_id's found in these 844 rows.
# Which site_id has the largest number of unique users? And what's the number?
is_BDV = df['country_id']=='BDV'
df_is_BDV = df[is_BDV]
df_is_BDV_by_site = df_is_BDV.groupby('site_id')
answer_1 = df_is_BDV_by_site['user_id'].nunique().sort_values().tail(1)['
# Between 2019-02-03 00:00:00 and 2019-02-04 23:59:59
# there are four users who visited a certain site more than 10 times.
# Find these four users & which sites they (each) visited more than 10 times.
# (Simply provides four triples in the form (user_id, site_id, number of visits) in the box below.)
df_time = df[(df['ts']> '2019-02-03 00:00:00') & (df['ts']< '2019-02-04 23:59:59')]
df_time_by_user_site = df_time.groupby(['user_id','site_id'])
df_time['views'] = df_time_by_user_site['site_id'].transform('count')
users_views_site_over_10 = df_time[df_time['views'] > 10]
del users_views_site_over_10['ts']
del users_views_site_over_10['country_id']
answer_2 = users_views_site_over_10.drop_duplicates()
# For each site, compute the unique number of users whose last visit was to that site.
# For instance, user "LC3561"'s last visit is to "N0OTG" based on timestamp data.
# Based on this measure, what are top three sites?
# (hint: site "3POLC" is ranked at 5th with 28 users whose last visit in the data set was to 3POLC
# simply provide three pairs in the form (site_id, number of users)
users_last_visit = df.groupby('user_id').last()
answer_3 = users_last_visit['site_id'].value_counts()
# For each user, determine the first site he/she visited and the last site he/she visited.
# Compute the number of users whose first/last visits are to the same website. What is the number?
users_first_visit = df.groupby('user_id').first()
users_last_visit = df.groupby('user_id').last()
combined_first_last_visit = pd.concat((users_first_visit['site_id'], users_last_visit['site_id']))
reset_combined_first_last_visit = combined_first_last_visit.reset_index()
users_with_same_first_last = reset_combined_first_last_visit[reset_combined_first_last_visit.duplicated()]
answer_4 = users_with_same_first_last.count()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment