Finding difference between two list of dictionary in Python

--------------------------------------------------
Hire the world's top talent on demand or became one of them at Toptal: https://topt.al/25cXVn
and get $2,000 discount on your first invoice
--------------------------------------------------

Music by Eric Matyas
https://www.soundimage.org
Track title: Luau

--

Chapters
00:00 Finding Difference Between Two List Of Dictionary In Python
01:02 Accepted Answer Score 6
01:50 Answer 2 Score 0
02:27 Thank you

--

Full question
https://stackoverflow.com/questions/3679...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #python3x #list #dictionary

#avk47

ACCEPTED ANSWER

Score 6

A set is the perfect solution for this problem. Unfortunately, python will not let you add dictionaries to a set, because they are mutable and their hashcode could change between insert and lookup.

If you "freeze" the items to make them immutable, you can then add them to set objects instead of a list; and then take a set difference using the minus operator:

In [20]: i_set = { frozenset(row.items()) for row in incoming_rows }

In [21]: a_set = { frozenset(row.items())  for row in available_row }

In [22]: (i_set - a_set)
Out[22]: 
{frozenset({('column_name', 'CONFIG_ID'),
            ('data_type', 'numeric(10,0)'),
            ('table_name', 'CONFIG')}),
 frozenset({('column_name', 'CREATE_DATE'),
            ('data_type', 'VARCHAR(20)'),
            ('table_name', 'CONFIG')}),
 frozenset({('column_name', 'CONFIG_TYPE'),
            ('data_type', 'varchar(1)'),
            ('table_name', 'CONFIG')})}

Edit: To unfreeze:

In [25]: [dict(i) for i in i_set - a_set]
Out[25]: 
[{'column_name': 'CONFIG_ID',
  'data_type': 'numeric(10,0)',
  'table_name': 'CONFIG'},
 {'column_name': 'CREATE_DATE',
  'data_type': 'VARCHAR(20)',
  'table_name': 'CONFIG'},
 {'column_name': 'CONFIG_TYPE',
  'data_type': 'varchar(1)',
  'table_name': 'CONFIG'}]

ANSWER 2

Score 0

For large datasets, and especially when you are working with numeric data, you may find better performance with 3rd party libraries. For example, Pandas accepts lists of directories directly:

import pandas as pd

# convert lists of dictionaries to dataframes
df_incoming, df_available = map(pd.DataFrame, (incoming_rows, available_row))

# merge data, adding indicator, and filter
res = df_available.merge(df_incoming, indicator=True, how='outer')
res = res[res['_merge'] == 'right_only']

print(res)

   column_name      data_type table_name      _merge
3  CREATE_DATE    VARCHAR(20)     CONFIG  right_only
4  CONFIG_TYPE     varchar(1)     CONFIG  right_only
5    CONFIG_ID  numeric(10,0)     CONFIG  right_only

If you require a list of dictionaries as output:

print(res.drop('_merge', 1).to_dict('records'))

[{'column_name': 'CREATE_DATE', 'data_type': 'VARCHAR(20)', 'table_name': 'CONFIG'},
 {'column_name': 'CONFIG_TYPE', 'data_type': 'varchar(1)', 'table_name': 'CONFIG'},
 {'column_name': 'CONFIG_ID', 'data_type': 'numeric(10,0)', 'table_name': 'CONFIG'}]