The Python Oracle

Download file from Blob URL with Python

--------------------------------------------------
Hire the world's top talent on demand or became one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------

Music by Eric Matyas
https://www.soundimage.org
Track title: Melt

--

Chapters
00:00 Download File From Blob Url With Python
00:46 Accepted Answer Score 5
01:26 Answer 2 Score 4
01:45 Answer 3 Score 0
02:03 Thank you

--

Full question
https://stackoverflow.com/questions/3951...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #download #blob #urllib

#avk47



ACCEPTED ANSWER

Score 5


That 289 byte long thing might be a HTML code for 403 forbidden page. This happen because the server is smart and rejects if your code does not specify a user agent.

Python 3

# python3
import urllib.request as request

url = 'http://www.xetra.com/blob/1193366/b2f210876702b8e08e40b8ecb769a02e/data/All-tradable-ETFs-ETCs-and-ETNs.xlsx'
# fake user agent of Safari
fake_useragent = 'Mozilla/5.0 (iPad; CPU OS 6_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10A5355d Safari/8536.25'
r = request.Request(url, headers={'User-Agent': fake_useragent})
f = request.urlopen(r)

# print or write
print(f.read())

Python 2

# python2
import urllib2

url = 'http://www.xetra.com/blob/1193366/b2f210876702b8e08e40b8ecb769a02e/data/All-tradable-ETFs-ETCs-and-ETNs.xlsx'
# fake user agent of safari
fake_useragent = 'Mozilla/5.0 (iPad; CPU OS 6_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10A5355d Safari/8536.25'

r = urllib2.Request(url, headers={'User-Agent': fake_useragent})
f = urllib2.urlopen(r)

print(f.read())



ANSWER 2

Score 4


from bs4 import BeautifulSoup
import requests
import re

url='http://www.xetra.com/xetra-en/instruments/etf-exchange-traded-funds/list-of-tradable-etfs'
html=requests.get(url)
page=BeautifulSoup(html.content)
reg=re.compile('Master data')
find=page.find('span',text=reg)  #find the file url
file_url='http://www.xetra.com'+find.parent['href']
file=requests.get(file_url)
with open(r'C:\\Users\user\Downloads\file.xlsx','wb') as ff:
    ff.write(file.content)

recommend requests and BeautifulSoup,both good lib




ANSWER 3

Score 0


for me, the target download url is like: blob:https://jdc.xxxx.com/2c21275b-c1ef-487a-a378-1672cd2d0760

i tried to write the original response in local xlsx, and found it was working.

import requests
        
r = requests.post(r'http://jdc.xxxx.com/rawreq/api/handler/desktop/export-rat-view?flag=jdc',json=data,headers={'content-type': 'application/json'})
file_name = 'file_name.xlsx'
with open(file_name, 'wb') as f:
    for chunk in r.iter_content(100000):
        f.write(chunk)