Getting nested data from MongoDB into a Pandas data frame
--
Track title: CC G Dvoks String Quartet No 12 Ame 2
--
Chapters
00:00 Question
01:54 Accepted answer (Score 10)
02:51 Thank you
--
Full question
https://stackoverflow.com/questions/3334...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #mongodb #twitter #pandas
#avk47
ACCEPTED ANSWER
Score 11
I use a function like this to get nested JSON lines into a dataframe. It uses the handy pandas json.normalize function:
import pandas as pd
from bson import json_util, ObjectId
from pandas.io.json import json_normalize
import json
def mongo_to_dataframe(mongo_data):
sanitized = json.loads(json_util.dumps(mongo_data))
normalized = json_normalize(sanitized)
df = pd.DataFrame(normalized)
return df
Just pass your mongo data by calling the function with it as an argument.
sanitized = json.loads(json_util.dumps(mongo_data)) loads the JSON lines as regular JSON
normalized = json_normalize(sanitized) un-nests the data
df = pd.DataFrame(normalized) simply turns it into a dataframe
ANSWER 2
Score 0
Use PyMongoArrow. This is a tool built by MongoDB just for this purpose. It allows you to efficiently move data in and out of MongoDB into other data formats such as pandas DataFrame, NumPy Array, Apache Arrow Table.
It also supports nested data and allows you to optionally define schema of your data and their data types when moving data from one to another.