The Python Oracle

How to check if a string in Python is in ASCII?

Become part of the top 3% of the developers by applying to Toptal https://topt.al/25cXVn

--

Track title: CC G Dvoks String Quartet No 12 Ame 2

--

Chapters
00:00 Question
00:34 Accepted answer (Score 229)
00:44 Answer 2 (Score 282)
01:35 Answer 3 (Score 183)
02:01 Answer 4 (Score 152)
02:28 Thank you

--

Full question
https://stackoverflow.com/questions/1963...

Question links:
[ord()]: http://docs.python.org/library/functions...

Answer 3 links:
[bpo32677]: https://bugs.python.org/issue32677
[.isascii()]: https://docs.python.org/3/library/stdtyp...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #string #unicode #ascii

#avk47



ANSWER 1

Score 289


I think you are not asking the right question--

A string in python has no property corresponding to 'ascii', utf-8, or any other encoding. The source of your string (whether you read it from a file, input from a keyboard, etc.) may have encoded a unicode string in ascii to produce your string, but that's where you need to go for an answer.

Perhaps the question you can ask is: "Is this string the result of encoding a unicode string in ascii?" -- This you can answer by trying:

try:
    mystring.decode('ascii')
except UnicodeDecodeError:
    print "it was not a ascii-encoded unicode string"
else:
    print "It may have been an ascii-encoded unicode string"



ACCEPTED ANSWER

Score 234


def is_ascii(s):
    return all(ord(c) < 128 for c in s)



ANSWER 3

Score 190


In Python 3, we can encode the string as UTF-8, then check whether the length stays the same. If so, then the original string is ASCII.

def isascii(s):
    """Check if the characters in string s are in ASCII, U+0-U+7F."""
    return len(s) == len(s.encode())

To check, pass the test string:

>>> isascii("♥O◘♦♥O◘♦")
False
>>> isascii("Python")
True



ANSWER 4

Score 29


Vincent Marchetti has the right idea, but str.decode has been deprecated in Python 3. In Python 3 you can make the same test with str.encode:

try:
    mystring.encode('ascii')
except UnicodeEncodeError:
    pass  # string is not ascii
else:
    pass  # string is ascii

Note the exception you want to catch has also changed from UnicodeDecodeError to UnicodeEncodeError.