Create column based on percentage of recurring customers
--------------------------------------------------
Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------
Music by Eric Matyas
https://www.soundimage.org
Track title: Dream Voyager Looping
--
Chapters
00:00 Create Column Based On Percentage Of Recurring Customers
02:52 Accepted Answer Score 1
04:22 Thank you
--
Full question
https://stackoverflow.com/questions/7021...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #pandas #dataframe
#avk47
Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------
Music by Eric Matyas
https://www.soundimage.org
Track title: Dream Voyager Looping
--
Chapters
00:00 Create Column Based On Percentage Of Recurring Customers
02:52 Accepted Answer Score 1
04:22 Thank you
--
Full question
https://stackoverflow.com/questions/7021...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #pandas #dataframe
#avk47
ACCEPTED ANSWER
Score 1
So I'm fairly new to Python but I've managed to answer my own question. Can't say this is the best, easiest, fastest way but it surely helped.
First of all I made a new dataframe which is an exact copy of the original dataframe, but only with 'True' values of the column 'recurring_customer'. I did that by using the following code:
df_recurring_customers = df.loc[df['recurring_customer'] == True]
It gave me the following dataframe:
df_recurring_customers.head()
{
"date_created" ["2019-11-25", "2019-11-28", "2019-12-02", "2019-12-09", "2019-12-11"]
"customer_id": ["577", "6457", "577", "6647", "840"],
"total": ["33891.12", "81.98", "9937.68", "1166.28", "2969.60"],
"recurring_customer": ["True", "True", "True", "True", "True"],
}
)
Then I resampled the values using:
df_recurring_customers_monthly_sum = df_recurring_customers.resample('1M').sum()
I then dropped the 'number' and 'customer_id' column, which had no value. The next step was to join the two dataframes 'df_monthly' and 'df_recurring_customers_monthly_sum' using:
df_total = df_recurring_customers_monthly_sum.join(df_monthly)
This gave me:
| date_created | total | recurring_customer_total |
| ------------ | ---------- | ------------------------ |
| 2019-11-30 | 644272.02 | 33973.10 |
| 2019-12-31 | 612205.99 | 15775.29 |
| 2020-01-31 | 887761.60 | 61612.27 |
| 2020-02-29 | 910724.75 | 125315.31 |
| 2020-03-31 | 1174662.59 | 125315.31 |
| 2020-04-30 | 1399332.26 | 248277.97 |
Then I wanted to know the percentage so
df_total['total_recurring_customer_percentage'] = (df_total['recurring_customer_total'] / df_total['total']) * 100
Which gave me:
| date_created | total | recurring_customer_total | recurring_customer_total_percentage |
| ------------ | ---------- | ------------------------ | ----------------------------------- |
| 2019-11-30 | 644272.02 | 33973.10 | 5.273099
| 2019-12-31 | 612205.99 | 15775.29 | 2.576794
| 2020-01-31 | 887761.60 | 61612.27 | 6.940182
| 2020-02-29 | 910724.75 | 125315.31 | 13.759954
| 2020-03-31 | 1174662.59 | 125315.31 | 13.967221
| 2020-04-30 | 1399332.26 | 248277.97 | 17.742603