The Python Oracle

Getting nested data from MongoDB into a Pandas data frame

Become part of the top 3% of the developers by applying to Toptal https://topt.al/25cXVn

--

Track title: CC G Dvoks String Quartet No 12 Ame 2

--

Chapters
00:00 Question
01:54 Accepted answer (Score 10)
02:51 Thank you

--

Full question
https://stackoverflow.com/questions/3334...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #mongodb #twitter #pandas

#avk47



ACCEPTED ANSWER

Score 11


I use a function like this to get nested JSON lines into a dataframe. It uses the handy pandas json.normalize function:

import pandas as pd
from bson import json_util, ObjectId
from pandas.io.json import json_normalize
import json

def mongo_to_dataframe(mongo_data):

        sanitized = json.loads(json_util.dumps(mongo_data))
        normalized = json_normalize(sanitized)
        df = pd.DataFrame(normalized)

        return df

Just pass your mongo data by calling the function with it as an argument.

sanitized = json.loads(json_util.dumps(mongo_data)) loads the JSON lines as regular JSON

normalized = json_normalize(sanitized) un-nests the data

df = pd.DataFrame(normalized) simply turns it into a dataframe




ANSWER 2

Score 0


Use PyMongoArrow. This is a tool built by MongoDB just for this purpose. It allows you to efficiently move data in and out of MongoDB into other data formats such as pandas DataFrame, NumPy Array, Apache Arrow Table.

It also supports nested data and allows you to optionally define schema of your data and their data types when moving data from one to another.