Easy Way to See if Two Columns are One-to-One in Pandas

--------------------------------------------------
Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------

Music by Eric Matyas
https://www.soundimage.org
Track title: Isolated

--

Chapters
00:00 Easy Way To See If Two Columns Are One-To-One In Pandas
00:28 Accepted Answer Score 11
01:01 Answer 2 Score 0
01:23 Answer 3 Score 9
01:57 Answer 4 Score 2
02:50 Thank you

--

Full question
https://stackoverflow.com/questions/5064...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #pandas #onetoone

#avk47

ACCEPTED ANSWER

Score 11

Well, you can create your own function to check it:

def isOneToOne(df, col1, col2):
    first = df.groupby(col1)[col2].count().max()
    second = df.groupby(col2)[col1].count().max()
    return first + second == 2

isOneToOne(df, 'A', 'B')
#True
isOneToOne(df, 'A', 'C')
#False
isOneToOne(df, 'B', 'C')
#False

In case you data is more like this:

df = pd.DataFrame({'A': [0, 1, 2, 0],
                   'C': ["'apple'", "'banana'", "'apple'", "'apple'"],
                   'B': ["'a'", "'b'", "'c'", "'a'"]})
df
#   A    B         C
#0  0  'a'   'apple'
#1  1  'b'  'banana'
#2  2  'c'   'apple'
#3  0  'a'   'apple'

Then you can use:

def isOneToOne(df, col1, col2):
    first = df.drop_duplicates([col1, col2]).groupby(col1)[col2].count().max()
    second = df.drop_duplicates([col1, col2]).groupby(col2)[col1].count().max()
    return first + second == 2

ANSWER 2

Score 9

df.groupby(col1)[col2]\
  .apply(lambda x: x.nunique() == 1)\
  .all()

should work fine if you want a true or false answer.

A nice way to visualize the relationship between two columns with discrete / categorical values (in case you are using Jupyter notebook) is :

df.groupby([col1, col2])\
  .apply(lambda x : x.count())\
  .iloc[:,0]\
  .unstack()\
  .fillna(0)

This matrix will tell you the correspondence between the column values in the two columns.

In case of a one-to-one relationship there will be only one non-zero value per row in the matrix.

ANSWER 3

Score 2

Here is my solution (only two or three lines of codes) to check for any number of columns to see whether they are one to one match (duplicated matches are allowed, see the example bellow).

cols = ['A', 'B'] # or any number of columns ['A', 'B', 'C']
res = df.groupby(cols).count()
uniqueness = [res.index.get_level_values(i).is_unique 
              for i in range(res.index.nlevels)]
all(uniqueness)

Let's make it a function and add some docs:

def is_one_to_one(df, cols):
    """Check whether any number of columns are one-to-one match.

    df: a pandas.DataFrame
    cols: must be a list of columns names

    Duplicated matches are allowed:
        a - 1
        b - 2
        b - 2
        c - 3
    (This two cols will return True)
    """
    if len(cols) == 1:
        return True
        # You can define you own rules for 1 column check, Or forbid it

    # MAIN THINGs: for 2 or more columns check!
    res = df.groupby(cols).count()
    # The count number info is actually bootless.
    # What maters here is the grouped *MultiIndex*
    # and its uniqueness in each level
    uniqueness = [res.index.get_level_values(i).is_unique
                  for i in range(res.index.nlevels)]
    return all(uniqueness)

By using this function, you can do the one-to-one match check:

df = pd.DataFrame({'A': [0, 1, 2, 0],
                   'B': ["'a'", "'b'", "'c'", "'a'"],
                   'C': ["'apple'", "'banana'", "'apple'", "'apple'"],})

is_one_to_one(df, ['A', 'B'])
is_one_to_one(df, ['A', 'C'])
is_one_to_one(df, ['A', 'B', 'C'])
# Outputs:
# True
# False
# False

ANSWER 4

Score 0

one way to solve this ,

df['A to B']=df.groupby('B')['A'].transform(lambda x:x.nunique()==1)
df['A to C']=df.groupby('C')['A'].transform(lambda x:x.nunique()==1)
df['B to C']=df.groupby('C')['B'].transform(lambda x:x.nunique()==1)

Output:

   A  B       C  A to B  A to C  B to C
0  0  a   apple    True   False   False
1  1  b  banana    True    True    True
2  2  c   apple    True   False   False

To check column by column:

print (df['A to B']==True).all()
print (df['A to C']==True).all()
print (df['B to C']==True).all()

True
False
False