Easy Way to See if Two Columns are One-to-One in Pandas
Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------
Music by Eric Matyas
https://www.soundimage.org
Track title: Dream Voyager Looping
--
Chapters
00:00 Easy Way To See If Two Columns Are One-To-One In Pandas
00:37 Accepted Answer Score 11
01:15 Answer 2 Score 9
01:55 Answer 3 Score 5
02:16 Answer 4 Score 2
03:22 Answer 5 Score 0
03:50 Thank you
--
Full question
https://stackoverflow.com/questions/5064...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #pandas #onetoone
#avk47
ACCEPTED ANSWER
Score 11
Well, you can create your own function to check it:
def isOneToOne(df, col1, col2):
first = df.groupby(col1)[col2].count().max()
second = df.groupby(col2)[col1].count().max()
return first + second == 2
isOneToOne(df, 'A', 'B')
#True
isOneToOne(df, 'A', 'C')
#False
isOneToOne(df, 'B', 'C')
#False
In case you data is more like this:
df = pd.DataFrame({'A': [0, 1, 2, 0],
'C': ["'apple'", "'banana'", "'apple'", "'apple'"],
'B': ["'a'", "'b'", "'c'", "'a'"]})
df
# A B C
#0 0 'a' 'apple'
#1 1 'b' 'banana'
#2 2 'c' 'apple'
#3 0 'a' 'apple'
Then you can use:
def isOneToOne(df, col1, col2):
first = df.drop_duplicates([col1, col2]).groupby(col1)[col2].count().max()
second = df.drop_duplicates([col1, col2]).groupby(col2)[col1].count().max()
return first + second == 2
ANSWER 2
Score 9
df.groupby(col1)[col2]\
.apply(lambda x: x.nunique() == 1)\
.all()
should work fine if you want a true or false answer.
A nice way to visualize the relationship between two columns with discrete / categorical values (in case you are using Jupyter notebook) is :
df.groupby([col1, col2])\
.apply(lambda x : x.count())\
.iloc[:,0]\
.unstack()\
.fillna(0)
This matrix will tell you the correspondence between the column values in the two columns.
In case of a one-to-one relationship there will be only one non-zero value per row in the matrix.
ANSWER 3
Score 2
Here is my solution (only two or three lines of codes) to check for any number of columns to see whether they are one to one match (duplicated matches are allowed, see the example bellow).
cols = ['A', 'B'] # or any number of columns ['A', 'B', 'C']
res = df.groupby(cols).count()
uniqueness = [res.index.get_level_values(i).is_unique
for i in range(res.index.nlevels)]
all(uniqueness)
Let's make it a function and add some docs:
def is_one_to_one(df, cols):
"""Check whether any number of columns are one-to-one match.
df: a pandas.DataFrame
cols: must be a list of columns names
Duplicated matches are allowed:
a - 1
b - 2
b - 2
c - 3
(This two cols will return True)
"""
if len(cols) == 1:
return True
# You can define you own rules for 1 column check, Or forbid it
# MAIN THINGs: for 2 or more columns check!
res = df.groupby(cols).count()
# The count number info is actually bootless.
# What maters here is the grouped *MultiIndex*
# and its uniqueness in each level
uniqueness = [res.index.get_level_values(i).is_unique
for i in range(res.index.nlevels)]
return all(uniqueness)
By using this function, you can do the one-to-one match check:
df = pd.DataFrame({'A': [0, 1, 2, 0],
'B': ["'a'", "'b'", "'c'", "'a'"],
'C': ["'apple'", "'banana'", "'apple'", "'apple'"],})
is_one_to_one(df, ['A', 'B'])
is_one_to_one(df, ['A', 'C'])
is_one_to_one(df, ['A', 'B', 'C'])
# Outputs:
# True
# False
# False
ANSWER 4
Score 0
one way to solve this ,
df['A to B']=df.groupby('B')['A'].transform(lambda x:x.nunique()==1)
df['A to C']=df.groupby('C')['A'].transform(lambda x:x.nunique()==1)
df['B to C']=df.groupby('C')['B'].transform(lambda x:x.nunique()==1)
Output:
A B C A to B A to C B to C
0 0 a apple True False False
1 1 b banana True True True
2 2 c apple True False False
To check column by column:
print (df['A to B']==True).all()
print (df['A to C']==True).all()
print (df['B to C']==True).all()
True
False
False