The Python Oracle

Create column based on percentage of recurring customers

--------------------------------------------------
Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------

Music by Eric Matyas
https://www.soundimage.org
Track title: Dream Voyager Looping

--

Chapters
00:00 Create Column Based On Percentage Of Recurring Customers
02:52 Accepted Answer Score 1
04:22 Thank you

--

Full question
https://stackoverflow.com/questions/7021...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #pandas #dataframe

#avk47



ACCEPTED ANSWER

Score 1


So I'm fairly new to Python but I've managed to answer my own question. Can't say this is the best, easiest, fastest way but it surely helped.

First of all I made a new dataframe which is an exact copy of the original dataframe, but only with 'True' values of the column 'recurring_customer'. I did that by using the following code:

df_recurring_customers = df.loc[df['recurring_customer'] == True]

It gave me the following dataframe:

df_recurring_customers.head()
    {
        "date_created" ["2019-11-25", "2019-11-28", "2019-12-02", "2019-12-09", "2019-12-11"]
        "customer_id": ["577", "6457", "577", "6647", "840"],
        "total": ["33891.12", "81.98", "9937.68", "1166.28", "2969.60"],
        "recurring_customer": ["True", "True", "True", "True", "True"],
    }
)

Then I resampled the values using:

df_recurring_customers_monthly_sum = df_recurring_customers.resample('1M').sum()

I then dropped the 'number' and 'customer_id' column, which had no value. The next step was to join the two dataframes 'df_monthly' and 'df_recurring_customers_monthly_sum' using:

df_total = df_recurring_customers_monthly_sum.join(df_monthly)

This gave me:

| date_created | total      | recurring_customer_total |
| ------------ | ---------- | ------------------------ |
|  2019-11-30  | 644272.02  |         33973.10         |
|  2019-12-31  | 612205.99  |         15775.29         |
|  2020-01-31  | 887761.60  |         61612.27         |
|  2020-02-29  | 910724.75  |         125315.31        |
|  2020-03-31  | 1174662.59 |         125315.31        |
|  2020-04-30  | 1399332.26 |         248277.97        |

Then I wanted to know the percentage so

df_total['total_recurring_customer_percentage'] = (df_total['recurring_customer_total'] / df_total['total']) * 100

Which gave me:

| date_created | total      | recurring_customer_total | recurring_customer_total_percentage | 
| ------------ | ---------- | ------------------------ | ----------------------------------- |
|  2019-11-30  | 644272.02  |         33973.10         |        5.273099
|  2019-12-31  | 612205.99  |         15775.29         |        2.576794
|  2020-01-31  | 887761.60  |         61612.27         |        6.940182
|  2020-02-29  | 910724.75  |         125315.31        |        13.759954
|  2020-03-31  | 1174662.59 |         125315.31        |        13.967221
|  2020-04-30  | 1399332.26 |         248277.97        |        17.742603