How to write DataFrame to postgres table
--
Music by Eric Matyas
https://www.soundimage.org
Track title: Switch On Looping
--
Chapters
00:00 Question
00:28 Accepted answer (Score 225)
01:38 Answer 2 (Score 134)
02:44 Answer 3 (Score 50)
03:37 Answer 4 (Score 43)
04:07 Thank you
--
Full question
https://stackoverflow.com/questions/2310...
Accepted answer links:
[docs]: http://pandas.pydata.org/pandas-docs/sta...
https://gist.github.com/jorisvandenbossc...
Answer 3 links:
https://pandas.pydata.org/pandas-docs/st...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #postgresql #pandas #sqlalchemy
#avk47
ACCEPTED ANSWER
Score 244
Starting from pandas 0.14 (released end of May 2014), postgresql is supported. The sql module now uses sqlalchemy to support different database flavors. You can pass a sqlalchemy engine for a postgresql database (see docs). E.g.:
from sqlalchemy import create_engine
engine = create_engine('postgresql://username:password@localhost:5432/mydatabase')
df.to_sql('table_name', engine)
You are correct that in pandas up to version 0.13.1 postgresql was not supported. If you need to use an older version of pandas, here is a patched version of pandas.io.sql: https://gist.github.com/jorisvandenbossche/10841234.
I wrote this a time ago, so cannot fully guarantee that it always works, buth the basis should be there). If you put that file in your working directory and import it, then you should be able to do (where con is a postgresql connection):
import sql # the patched version (file is named sql.py)
sql.write_frame(df, 'table_name', con, flavor='postgresql')
ANSWER 2
Score 148
Faster option:
The following code will copy your Pandas DF to postgres DB much faster than df.to_sql method and you won't need any intermediate csv file to store the df.
Create an engine based on your DB specifications.
Create a table in your postgres DB that has equal number of columns as the Dataframe (df).
Data in DF will get inserted in your postgres table.
from sqlalchemy import create_engine
import psycopg2
import io
If you want to replace the table, we can replace it with normal to_sql method using headers from our df and then load the entire big time consuming df into DB.
engine = create_engine(
'postgresql+psycopg2://username:password@host:port/database')
# Drop old table and create new empty table
df.head(0).to_sql('table_name', engine, if_exists='replace',index=False)
conn = engine.raw_connection()
cur = conn.cursor()
output = io.StringIO()
df.to_csv(output, sep='\t', header=False, index=False)
output.seek(0)
contents = output.getvalue()
cur.copy_from(output, 'table_name', null="") # null values become ''
conn.commit()
cur.close()
conn.close()
ANSWER 3
Score 52
Pandas 0.24.0+ solution
In Pandas 0.24.0 a new feature was introduced specifically designed for fast writes to Postgres. You can learn more about it here: https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#io-sql-method
import csv
from io import StringIO
from sqlalchemy import create_engine
def psql_insert_copy(table, conn, keys, data_iter):
# gets a DBAPI connection that can provide a cursor
dbapi_conn = conn.connection
with dbapi_conn.cursor() as cur:
s_buf = StringIO()
writer = csv.writer(s_buf)
writer.writerows(data_iter)
s_buf.seek(0)
columns = ', '.join('"{}"'.format(k) for k in keys)
if table.schema:
table_name = '{}.{}'.format(table.schema, table.name)
else:
table_name = table.name
sql = 'COPY {} ({}) FROM STDIN WITH CSV'.format(
table_name, columns)
cur.copy_expert(sql=sql, file=s_buf)
engine = create_engine('postgresql://myusername:mypassword@myhost:5432/mydatabase')
df.to_sql('table_name', engine, method=psql_insert_copy)
ANSWER 4
Score 44
This is how I did it.
It may be faster because it is using execute_batch:
# df is the dataframe
if len(df) > 0:
df_columns = list(df)
# create (col1,col2,...)
columns = ",".join(df_columns)
# create VALUES('%s', '%s",...) one '%s' per column
values = "VALUES({})".format(",".join(["%s" for _ in df_columns]))
#create INSERT INTO table (columns) VALUES('%s',...)
insert_stmt = "INSERT INTO {} ({}) {}".format(table,columns,values)
cur = conn.cursor()
psycopg2.extras.execute_batch(cur, insert_stmt, df.values)
conn.commit()
cur.close()