Pandas read_csv from url
Hire the world's top talent on demand or became one of them at Toptal: https://topt.al/25cXVn
and get $2,000 discount on your first invoice
--------------------------------------------------
Take control of your privacy with Proton's trusted, Swiss-based, secure services.
Choose what you need and safeguard your digital life:
Mail: https://go.getproton.me/SH1CU
VPN: https://go.getproton.me/SH1DI
Password Manager: https://go.getproton.me/SH1DJ
Drive: https://go.getproton.me/SH1CT
Music by Eric Matyas
https://www.soundimage.org
Track title: Breezy Bay
--
Chapters
00:00 Pandas Read_csv From Url
00:26 Accepted Answer Score 262
01:34 Answer 2 Score 21
02:50 Answer 3 Score 11
03:37 Answer 4 Score 360
03:50 Thank you
--
Full question
https://stackoverflow.com/questions/3240...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #csv #pandas #request
#avk47
ANSWER 1
Score 360
In the latest version of pandas (0.19.2) you can directly pass the url
import pandas as pd
url = "https://raw.githubusercontent.com/cs109/2014_data/master/countries.csv"
c = pd.read_csv(url)
ACCEPTED ANSWER
Score 262
Update: From pandas 0.19.2 you can now just pass read_csv() the url directly, although that will fail if it requires authentication.
For older pandas versions, or if you need authentication, or for any other HTTP-fault-tolerant reason:
Use pandas.read_csv with a file-like object as the first argument.
If you want to read the csv from a string, you can use
io.StringIO.For the URL
https://github.com/cs109/2014_data/blob/master/countries.csv, you gethtmlresponse, not raw csv; you should use the url given by theRawlink in the github page for getting raw csv response , which ishttps://raw.githubusercontent.com/cs109/2014_data/master/countries.csv
Example:
import pandas as pd
import io
import requests
url = "https://raw.githubusercontent.com/cs109/2014_data/master/countries.csv"
s = requests.get(url).content
c = pd.read_csv(io.StringIO(s.decode('utf-8')))
Note: in Python 2.x, the string-buffer object was StringIO.StringIO
ANSWER 3
Score 21
As I commented you need to use a StringIO object and decode i.e c=pd.read_csv(io.StringIO(s.decode("utf-8"))) if using requests, you need to decode as .content returns bytes if you used .text you would just need to pass s as is s = requests.get(url).text c = pd.read_csv(StringIO(s)).
A simpler approach is to pass the correct url of the raw data directly to read_csv, you don't have to pass a file like object, you can pass a url so you don't need requests at all:
c = pd.read_csv("https://raw.githubusercontent.com/cs109/2014_data/master/countries.csv")
print(c)
Output:
Country Region
0 Algeria AFRICA
1 Angola AFRICA
2 Benin AFRICA
3 Botswana AFRICA
4 Burkina AFRICA
5 Burundi AFRICA
6 Cameroon AFRICA
..................................
From the docs:
filepath_or_buffer :
string or file handle / StringIO The string could be a URL. Valid URL schemes include http, ftp, s3, and file. For file URLs, a host is expected. For instance, a local file could be file ://localhost/path/to/table.csv
ANSWER 4
Score 11
The problem you're having is that the output you get into the variable 's' is not a csv, but a html file. In order to get the raw csv, you have to modify the url to:
'https://raw.githubusercontent.com/cs109/2014_data/master/countries.csv'
Your second problem is that read_csv expects a file name, we can solve this by using StringIO from io module. Third problem is that request.get(url).content delivers a byte stream, we can solve this using the request.get(url).text instead.
End result is this code:
from io import StringIO
import pandas as pd
import requests
url='https://raw.githubusercontent.com/cs109/2014_data/master/countries.csv'
s=requests.get(url).text
c=pd.read_csv(StringIO(s))
output:
>>> c.head()
Country Region
0 Algeria AFRICA
1 Angola AFRICA
2 Benin AFRICA
3 Botswana AFRICA
4 Burkina AFRICA