How to check if a string in Python is in ASCII?
--
Track title: CC G Dvoks String Quartet No 12 Ame 2
--
Chapters
00:00 Question
00:34 Accepted answer (Score 229)
00:44 Answer 2 (Score 282)
01:35 Answer 3 (Score 183)
02:01 Answer 4 (Score 152)
02:28 Thank you
--
Full question
https://stackoverflow.com/questions/1963...
Question links:
[ord()]: http://docs.python.org/library/functions...
Answer 3 links:
[bpo32677]: https://bugs.python.org/issue32677
[.isascii()]: https://docs.python.org/3/library/stdtyp...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #string #unicode #ascii
#avk47
ANSWER 1
Score 289
I think you are not asking the right question--
A string in python has no property corresponding to 'ascii', utf-8, or any other encoding. The source of your string (whether you read it from a file, input from a keyboard, etc.) may have encoded a unicode string in ascii to produce your string, but that's where you need to go for an answer.
Perhaps the question you can ask is: "Is this string the result of encoding a unicode string in ascii?" -- This you can answer by trying:
try:
mystring.decode('ascii')
except UnicodeDecodeError:
print "it was not a ascii-encoded unicode string"
else:
print "It may have been an ascii-encoded unicode string"
ACCEPTED ANSWER
Score 234
def is_ascii(s):
return all(ord(c) < 128 for c in s)
ANSWER 3
Score 190
In Python 3, we can encode the string as UTF-8, then check whether the length stays the same. If so, then the original string is ASCII.
def isascii(s):
"""Check if the characters in string s are in ASCII, U+0-U+7F."""
return len(s) == len(s.encode())
To check, pass the test string:
>>> isascii("♥O◘♦♥O◘♦")
False
>>> isascii("Python")
True
ANSWER 4
Score 29
Vincent Marchetti has the right idea, but str.decode has been deprecated in Python 3. In Python 3 you can make the same test with str.encode:
try:
mystring.encode('ascii')
except UnicodeEncodeError:
pass # string is not ascii
else:
pass # string is ascii
Note the exception you want to catch has also changed from UnicodeDecodeError to UnicodeEncodeError.