Finding difference between two list of dictionary in Python
Become part of the top 3% of the developers by applying to Toptal https://topt.al/25cXVn
--
Music by Eric Matyas
https://www.soundimage.org
Track title: Romantic Lands Beckon
--
Chapters
00:00 Question
01:11 Accepted answer (Score 6)
02:09 Answer 2 (Score 0)
02:59 Thank you
--
Full question
https://stackoverflow.com/questions/3679...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #python3x #list #dictionary
#avk47
--
Music by Eric Matyas
https://www.soundimage.org
Track title: Romantic Lands Beckon
--
Chapters
00:00 Question
01:11 Accepted answer (Score 6)
02:09 Answer 2 (Score 0)
02:59 Thank you
--
Full question
https://stackoverflow.com/questions/3679...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #python3x #list #dictionary
#avk47
ACCEPTED ANSWER
Score 6
A set is the perfect solution for this problem. Unfortunately, python will not let you add dictionaries to a set, because they are mutable and their hashcode could change between insert and lookup.
If you "freeze" the items to make them immutable, you can then add them to set objects instead of a list; and then take a set difference using the minus operator:
In [20]: i_set = { frozenset(row.items()) for row in incoming_rows }
In [21]: a_set = { frozenset(row.items()) for row in available_row }
In [22]: (i_set - a_set)
Out[22]:
{frozenset({('column_name', 'CONFIG_ID'),
('data_type', 'numeric(10,0)'),
('table_name', 'CONFIG')}),
frozenset({('column_name', 'CREATE_DATE'),
('data_type', 'VARCHAR(20)'),
('table_name', 'CONFIG')}),
frozenset({('column_name', 'CONFIG_TYPE'),
('data_type', 'varchar(1)'),
('table_name', 'CONFIG')})}
Edit: To unfreeze:
In [25]: [dict(i) for i in i_set - a_set]
Out[25]:
[{'column_name': 'CONFIG_ID',
'data_type': 'numeric(10,0)',
'table_name': 'CONFIG'},
{'column_name': 'CREATE_DATE',
'data_type': 'VARCHAR(20)',
'table_name': 'CONFIG'},
{'column_name': 'CONFIG_TYPE',
'data_type': 'varchar(1)',
'table_name': 'CONFIG'}]
ANSWER 2
Score 0
For large datasets, and especially when you are working with numeric data, you may find better performance with 3rd party libraries. For example, Pandas accepts lists of directories directly:
import pandas as pd
# convert lists of dictionaries to dataframes
df_incoming, df_available = map(pd.DataFrame, (incoming_rows, available_row))
# merge data, adding indicator, and filter
res = df_available.merge(df_incoming, indicator=True, how='outer')
res = res[res['_merge'] == 'right_only']
print(res)
column_name data_type table_name _merge
3 CREATE_DATE VARCHAR(20) CONFIG right_only
4 CONFIG_TYPE varchar(1) CONFIG right_only
5 CONFIG_ID numeric(10,0) CONFIG right_only
If you require a list of dictionaries as output:
print(res.drop('_merge', 1).to_dict('records'))
[{'column_name': 'CREATE_DATE', 'data_type': 'VARCHAR(20)', 'table_name': 'CONFIG'},
{'column_name': 'CONFIG_TYPE', 'data_type': 'varchar(1)', 'table_name': 'CONFIG'},
{'column_name': 'CONFIG_ID', 'data_type': 'numeric(10,0)', 'table_name': 'CONFIG'}]