The Python Oracle

How do I make custom comparisons in pytest?

--------------------------------------------------
Hire the world's top talent on demand or became one of them at Toptal: https://topt.al/25cXVn
and get $2,000 discount on your first invoice
--------------------------------------------------

Music by Eric Matyas
https://www.soundimage.org
Track title: Puddle Jumping Looping

--

Chapters
00:00 How Do I Make Custom Comparisons In Pytest?
00:49 Answer 1 Score 1
01:28 Accepted Answer Score 5
02:42 Answer 3 Score 1
03:09 Thank you

--

Full question
https://stackoverflow.com/questions/5460...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #pytest

#avk47



ACCEPTED ANSWER

Score 5


My current solution is to use a patch to override the DataFrame's __eq__ method. Here's an example with Pandas as it's faster to test with, the idea should apply to any object.

import pandas as pd
# use this import for python3
# from unittest.mock import patch
from mock import patch


def custom_df_compare(self, other):
    # Put logic for comparing df's here
    # Returning True for demonstration
    return True


@patch("pandas.DataFrame.__eq__", custom_df_compare)
def test_df_equal():
    df1 = pd.DataFrame(
        {"id": [1, 2, 3], "name": ["a", "b", "c"]}, columns=["id", "name"]
    )
    df2 = pd.DataFrame(
        {"id": [2, 3, 4], "name": ["b", "c", "d"]}, columns=["id", "name"]
    )

    assert df1 == df2

Haven't tried it yet but am planning on adding it as a fixture and using autouse to use it for all tests automatically.

In order to elegantly handle the "order matters" indicator, I'm playing with an approach similar to pytest.approx which returns a new class with it's own __eq__ for example:

class SortedDF(object):
    "Indicates that the order of data matters when comparing to another df"

    def __init__(self, df):
        self.df = df

    def __eq__(self, other):
        # Put logic for comparing df's including order of data here
        # Returning True for demonstration purposes
        return True


def test_sorted_df():
    df1 = pd.DataFrame(
        {"id": [1, 2, 3], "name": ["a", "b", "c"]}, columns=["id", "name"]
    )
    df2 = pd.DataFrame(
        {"id": [2, 3, 4], "name": ["b", "c", "d"]}, columns=["id", "name"]
    )

    # Passes because SortedDF.__eq__ is used
    assert SortedDF(df1) == df2
    # Fails because df2's __eq__ method is used
    assert df2 == SortedDF(df2)

The minor issue I haven't been able to resolve is the failure of the second assert, assert df2 == SortedDF(df2). This order works fine with pytest.approx but doesn't here. I've tried reading up on the == operator but haven't been able to figure out how to fix the second case.




ANSWER 2

Score 1


To do a raw comparison between the values of the DataFrames (must be exact order), you can do something like this:

import pandas as pd
from pyspark.sql import Row

df1 = spark.createDataFrame([Row(a=1, b=2, c=3), Row(a=1, b=3, c=3)])
df2 = spark.createDataFrame([Row(a=1, b=2, c=3), Row(a=1, b=3, c=3)])

pd.testing.assert_frame_equal(df1.toPandas(), df2.toPandas())

If you want to specify by order, you can do some transformations on the pandas DataFrame to sort by a particular column first using the following function:

def assert_frame_equal_with_sort(results, expected, keycolumns):
  results = results.reindex(sorted(results.columns), axis=1)
  expected = expected.reindex(sorted(expected.columns), axis=1)

  results_sorted = results.sort_values(by=keycolumns).reset_index(drop=True)
  expected_sorted = expected.sort_values(by=keycolumns).reset_index(drop=True)

  pd.testing.assert_frame_equal(results_sorted, expected_sorted)


df1 = spark.createDataFrame([Row(a=1, b=2, c=3), Row(a=1, b=3, c=3)])
df2 = spark.createDataFrame([Row(a=1, b=3, c=3), Row(a=1, b=2, c=3)])

assert_frame_equal_with_sort(df1.toPandas(), df2.toPandas(), ['b'])



ANSWER 3

Score 1


just use the pandas.Dataframe.equals method https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.equals.html

For example

assert df1.equals(df2)

assert can be used with anything that returns a boolean. So yes you can write any custom comparison function to compare two objects. As long as the custom function returns a boolean. However, in this case there is no need for a custom function as pandas already provides one