The Python Oracle

Create column based on percentage of recurring customers

Become part of the top 3% of the developers by applying to Toptal https://topt.al/25cXVn

--

Music by Eric Matyas
https://www.soundimage.org
Track title: Digital Sunset Looping

--

Chapters
00:00 Question
03:34 Accepted answer (Score 1)
05:26 Thank you

--

Full question
https://stackoverflow.com/questions/7021...

Question links:
[get values from first column when other columns are true using a lookup list]: https://stackoverflow.com/questions/5804...
[Getting indices of True values in a boolean list]: https://stackoverflow.com/questions/2144...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #pandas #dataframe

#avk47



ACCEPTED ANSWER

Score 1


So I'm fairly new to Python but I've managed to answer my own question. Can't say this is the best, easiest, fastest way but it surely helped.

First of all I made a new dataframe which is an exact copy of the original dataframe, but only with 'True' values of the column 'recurring_customer'. I did that by using the following code:

df_recurring_customers = df.loc[df['recurring_customer'] == True]

It gave me the following dataframe:

df_recurring_customers.head()
    {
        "date_created" ["2019-11-25", "2019-11-28", "2019-12-02", "2019-12-09", "2019-12-11"]
        "customer_id": ["577", "6457", "577", "6647", "840"],
        "total": ["33891.12", "81.98", "9937.68", "1166.28", "2969.60"],
        "recurring_customer": ["True", "True", "True", "True", "True"],
    }
)

Then I resampled the values using:

df_recurring_customers_monthly_sum = df_recurring_customers.resample('1M').sum()

I then dropped the 'number' and 'customer_id' column, which had no value. The next step was to join the two dataframes 'df_monthly' and 'df_recurring_customers_monthly_sum' using:

df_total = df_recurring_customers_monthly_sum.join(df_monthly)

This gave me:

| date_created | total      | recurring_customer_total |
| ------------ | ---------- | ------------------------ |
|  2019-11-30  | 644272.02  |         33973.10         |
|  2019-12-31  | 612205.99  |         15775.29         |
|  2020-01-31  | 887761.60  |         61612.27         |
|  2020-02-29  | 910724.75  |         125315.31        |
|  2020-03-31  | 1174662.59 |         125315.31        |
|  2020-04-30  | 1399332.26 |         248277.97        |

Then I wanted to know the percentage so

df_total['total_recurring_customer_percentage'] = (df_total['recurring_customer_total'] / df_total['total']) * 100

Which gave me:

| date_created | total      | recurring_customer_total | recurring_customer_total_percentage | 
| ------------ | ---------- | ------------------------ | ----------------------------------- |
|  2019-11-30  | 644272.02  |         33973.10         |        5.273099
|  2019-12-31  | 612205.99  |         15775.29         |        2.576794
|  2020-01-31  | 887761.60  |         61612.27         |        6.940182
|  2020-02-29  | 910724.75  |         125315.31        |        13.759954
|  2020-03-31  | 1174662.59 |         125315.31        |        13.967221
|  2020-04-30  | 1399332.26 |         248277.97        |        17.742603