The Python Oracle

Calculating formula based on multiple columns in Pandas Dataframe - but without creating many intermediate columns

--------------------------------------------------
Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------

Music by Eric Matyas
https://www.soundimage.org
Track title: Puzzle Game 3

--

Chapters
00:00 Calculating Formula Based On Multiple Columns In Pandas Dataframe - But Without Creating Many Interm
00:52 Accepted Answer Score 3
01:51 Answer 2 Score 1
01:59 Thank you

--

Full question
https://stackoverflow.com/questions/5059...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #pandas #numpy

#avk47



ACCEPTED ANSWER

Score 3


Use concat with max:

df['TR'] = pd.concat([(df['high'] - df['low']), 
                      (df['high'] - df['adjclose'].shift(1)).abs(),
                      (df['low']  - df['adjclose'].shift(1))], axis=1).max(axis=1)

Sample:

df = pd.DataFrame({'high':[4,5,4,5,5,4],
                   'low':[7,8,9,4,2,3],
                   'adjclose':[1,3,5,7,1,0]})

print (df)
   adjclose  high  low
0         1     4    7
1         3     5    8
2         5     4    9
3         7     5    4
4         1     5    2
5         0     4    3

df['TR'] = pd.concat([(df['high']-df['low']), 
                      (df['high'] - df['adjclose'].shift(1)).abs(),
                      (df['low'] - df['adjclose'].shift(1))], axis=1).max(axis=1)

print (df)
  adjclose  high  low   TR
0         1     4    7 -3.0
1         3     5    8  7.0
2         5     4    9  6.0
3         7     5    4  1.0
4         1     5    2  3.0
5         0     4    3  3.0

Detail:

print (pd.concat([(df['high']-df['low']), 
                      (df['high'] - df['adjclose'].shift(1)).abs(),
                      (df['low'] - df['adjclose'].shift(1))], axis=1))
   0    1    2
0 -3  NaN  NaN
1 -3  4.0  7.0
2 -5  1.0  6.0
3  1  0.0 -1.0
4  3  2.0 -5.0
5  1  3.0  2.0

Numpy solution is different, because max of NaN in row is again NaN:

df['TR1'] = np.max(np.c_[(df['high']-df['low']), 
                        (df['high'] - df['adjclose'].shift(1)).abs(),
                        (df['low'] - df['adjclose'].shift(1))], axis=1)

print (df)
   adjclose  high  low  TR1
0         1     4    7  NaN
1         3     5    8  7.0
2         5     4    9  6.0
3         7     5    4  1.0
4         1     5    2  3.0
5         0     4    3  3.0

print (np.c_[(df['high']-df['low']), 
                        (df['high'] - df['adjclose'].shift(1)).abs(),
                        (df['low'] - df['adjclose'].shift(1))])

[[-3. nan nan]
 [-3.  4.  7.]
 [-5.  1.  6.]
 [ 1.  0. -1.]
 [ 3.  2. -5.]
 [ 1.  3.  2.]] 



ANSWER 2

Score 1


It can be done by :

df['TR']=list(map(max,zip((df['high']-df['low']), (df['high'] - df['adjclose'].shift(1)).abs(),(df['low'] - df['adjclose'].shift(1)))))