The Python Oracle

How to validate a url in Python? (Malformed or not)

Become part of the top 3% of the developers by applying to Toptal https://topt.al/25cXVn

--

Track title: CC H Dvoks String Quartet No 12 Ame

--

Chapters
00:00 Question
00:27 Accepted answer (Score 129)
00:50 Answer 2 (Score 213)
01:12 Answer 3 (Score 146)
01:47 Answer 4 (Score 125)
02:30 Thank you

--

Full question
https://stackoverflow.com/questions/7160...

Accepted answer links:
[source]: https://github.com/django/django/blob/st...

Answer 2 links:
[validators]: http://validators.readthedocs.org/en/lat.../
[from PyPI]: https://pypi.org/project/validators/

Answer 3 links:
[How can I check if a URL exists with Django’s validators?]: https://stackoverflow.com/questions/3170...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #url #malformedurlexception

#avk47



ANSWER 1

Score 239


Use the validators package:

>>> import validators
>>> validators.url("http://google.com")
True
>>> validators.url("http://google")
ValidationFailure(func=url, args={'value': 'http://google', 'require_tld': True})
>>> if not validators.url("http://google"):
...     print "not valid"
... 
not valid
>>>

Install it from PyPI with pip (pip install validators).




ANSWER 2

Score 155


A True or False version, based on @DMfll answer:

try:
    # python2
    from urlparse import urlparse
except ModuleNotFoundError:
    # python3
    from urllib.parse import urlparse

a = 'http://www.cwi.nl:80/%7Eguido/Python.html'
b = '/data/Python.html'
c = 532
d = u'dkakasdkjdjakdjadjfalskdjfalk'
e = 'https://stackoverflow.com'

def uri_validator(x):
    try:
        result = urlparse(x)
        return all([result.scheme, result.netloc])
    except AttributeError:
        return False

print(uri_validator(a))
print(uri_validator(b))
print(uri_validator(c))
print(uri_validator(d))
print(uri_validator(e))

Gives:

True
False
False
False
True



ANSWER 3

Score 154


Actually, I think this is the best way.

from django.core.validators import URLValidator
from django.core.exceptions import ValidationError

val = URLValidator()
try:
    val('httpx://www.google.com')
except (ValidationError,) as e: 
    print(e)

edit: ah yeah, this question is a duplicate of this: How can I check if a URL exists with Django’s validators?




ACCEPTED ANSWER

Score 139


django url validation regex (source):

import re
regex = re.compile(
        r'^(?:http|ftp)s?://' # http:// or https://
        r'(?:(?:[A-Z0-9](?:[A-Z0-9-]{0,61}[A-Z0-9])?\.)+(?:[A-Z]{2,6}\.?|[A-Z0-9-]{2,}\.?)|' #domain...
        r'localhost|' #localhost...
        r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})' # ...or ip
        r'(?::\d+)?' # optional port
        r'(?:/?|[/?]\S+)$', re.IGNORECASE)

print(re.match(regex, "http://www.example.com") is not None) # True
print(re.match(regex, "example.com") is not None)            # False