The Python Oracle

Getting nested data from MongoDB into a Pandas data frame

--------------------------------------------------
Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------

Music by Eric Matyas
https://www.soundimage.org
Track title: Digital Sunset Looping

--

Chapters
00:00 Getting Nested Data From Mongodb Into A Pandas Data Frame
01:35 Accepted Answer Score 11
02:16 Answer 2 Score 0
02:40 Thank you

--

Full question
https://stackoverflow.com/questions/3334...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #mongodb #twitter #pandas

#avk47



ACCEPTED ANSWER

Score 11


I use a function like this to get nested JSON lines into a dataframe. It uses the handy pandas json.normalize function:

import pandas as pd
from bson import json_util, ObjectId
from pandas.io.json import json_normalize
import json

def mongo_to_dataframe(mongo_data):

        sanitized = json.loads(json_util.dumps(mongo_data))
        normalized = json_normalize(sanitized)
        df = pd.DataFrame(normalized)

        return df

Just pass your mongo data by calling the function with it as an argument.

sanitized = json.loads(json_util.dumps(mongo_data)) loads the JSON lines as regular JSON

normalized = json_normalize(sanitized) un-nests the data

df = pd.DataFrame(normalized) simply turns it into a dataframe




ANSWER 2

Score 0


Use PyMongoArrow. This is a tool built by MongoDB just for this purpose. It allows you to efficiently move data in and out of MongoDB into other data formats such as pandas DataFrame, NumPy Array, Apache Arrow Table.

It also supports nested data and allows you to optionally define schema of your data and their data types when moving data from one to another.