How to find the last true position of the group starting from the first position to be true faster?

--------------------------------------------------
Hire the world's top talent on demand or became one of them at Toptal: https://topt.al/25cXVn
and get $2,000 discount on your first invoice
--------------------------------------------------

Take control of your privacy with Proton's trusted, Swiss-based, secure services.
Choose what you need and safeguard your digital life:
Mail: https://go.getproton.me/SH1CU
VPN: https://go.getproton.me/SH1DI
Password Manager: https://go.getproton.me/SH1DJ
Drive: https://go.getproton.me/SH1CT

Music by Eric Matyas
https://www.soundimage.org
Track title: City Beneath the Waves Looping

--

Chapters
00:00 How To Find The Last True Position Of The Group Starting From The First Position To Be True Faster?
01:13 Accepted Answer Score 4
01:31 Answer 2 Score 1
02:10 Thank you

--

Full question
https://stackoverflow.com/questions/7030...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #pandas #dataframe

#avk47

ACCEPTED ANSWER

Score 4

Use numba for processing values to first Trues block, inspiration by this solution:

from numba import njit

@njit
def sort_order3(a, b):
    if not a[0]:
        return 0
    else:
        for i in range(1, len(a)):
            if not a[i]:
                return b[i - 1]
        return b[-1]


  
df = generate_data()
print (sort_order3(df['data'].to_numpy(), df['order'].to_numpy()))

ANSWER 2

Score 1

Maybe I am missing something but why dont you just get the index of the first False in df.data then use that index to get the value in the df.order column?

For example:

def sort_order3(df):
    try:
        idx = df.data.to_list().index(False)
    except ValueError: # meaning there is no False in df.data
        idx = df.data.size - 1
    return df.order[idx]

Or for really large data numpy might be faster:

def sort_order4(df):
    try:
        idx = np.argwhere(~df.data.values)[0, 0]
    except IndexError: # meaning there is no False in df.data
        idx = df.data.size - 1
    return df.order[idx]

The timing on my device:

%timeit sort_order(df.copy())
565 µs ± 6.29 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit sort_order2(df.copy())
443 µs ± 10.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit sort_order3(df.copy())
96.5 µs ± 2.16 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

%timeit sort_order4(df.copy())
112 µs ± 5.06 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)