The Python Oracle

Under which circumstances do equal strings share the same reference?

--------------------------------------------------
Hire the world's top talent on demand or became one of them at Toptal: https://topt.al/25cXVn
and get $2,000 discount on your first invoice
--------------------------------------------------

Take control of your privacy with Proton's trusted, Swiss-based, secure services.
Choose what you need and safeguard your digital life:
Mail: https://go.getproton.me/SH1CU
VPN: https://go.getproton.me/SH1DI
Password Manager: https://go.getproton.me/SH1DJ
Drive: https://go.getproton.me/SH1CT


Music by Eric Matyas
https://www.soundimage.org
Track title: Lost Jungle Looping

--

Chapters
00:00 Under Which Circumstances Do Equal Strings Share The Same Reference?
01:20 Accepted Answer Score 9
02:41 Answer 2 Score 5
03:20 Answer 3 Score 5
04:10 Thank you

--

Full question
https://stackoverflow.com/questions/1161...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #string #reference #immutability

#avk47



ACCEPTED ANSWER

Score 9


The details of when strings are cached and reused are implementation-dependent, can change from Python version to Python version and cannot be relied upon. If you want to check strings for equality, use ==, not is.

In CPython (the most commonly-used Python implementation), string literals consisting only of ASCII letters, digits, and underscores are interned, so if such a string literal occurs twice in the source code, they will end up pointing to the same string object. In Python 2.x, you can also call the built-in function intern() to force that a particular string is interned, but you actually shouldn't do so.

Edit regarding you actual aim of checking whether attributes are improperly shared between instances: This kind of check is only useful for mutable objects. For attributes of immutable type, there is no semantic difference between shared and unshared objects. You could exclude immutable types from your tests by using

Immutable = basestring, tuple, numbers.Number, frozenset
# ...
if not isinstance(x, Immutable):    # Exclude types known to be immutable

Note that this would also exclude tuples that contain mutable objects. If you wanted to test those, you would need to recursively descend into tuples.




ANSWER 2

Score 5


I think it's an implementation and optimization thing. If the string are short, they can (and are often?) "shared", but you can't depend on that. Once you have longer strings, you can see that they are not the same.

In [2]: s1 = 'abc'
In [3]: s2 = 'abc'

In [4]: s1 is s2
Out[4]: True

longer strings

In [5]: s1 = 'abc this is much longer'
In [6]: s2 = 'abc this is much longer'

In [7]: s1 is s2
Out[7]: False

use == to compare strings (and not the is operator).

--

OP's observation/hypothesis (in the comments below) that this may be due to the number of tokens seems to be supported by the following:

In [12]: s1 = 'a b c'
In [13]: s2 = 'a b c'

In [14]: s1 is s2
Out[14]: False

if compared with the initial example of abc above.




ANSWER 3

Score 5


In CPython, as an implementation detail the empty string is shared, as are single-character strings whose codepoint is in the Latin-1 range. You should not depend on this, as it is possible to bypass this feature.

You can request a string to be interned using sys.intern; this will happen automatically in some cases:

Normally, the names used in Python programs are automatically interned, and the dictionaries used to hold module, class or instance attributes have interned keys.

sys.intern is exposed so that you can use it (after profiling!) for performance:

Interning strings is useful to gain a little performance on dictionary lookup – if the keys in a dictionary are interned, and the lookup key is interned, the key comparisons (after hashing) can be done by a pointer compare instead of a string compare.

Note that intern is a builtin in Python 2.