Uncommon behaviour of IS operator in python
Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------
Music by Eric Matyas
https://www.soundimage.org
Track title: Unforgiving Himalayas Looping
--
Chapters
00:00 Uncommon Behaviour Of Is Operator In Python
00:31 Answer 1 Score 2
01:32 Answer 2 Score 2
02:05 Answer 3 Score 2
04:06 Accepted Answer Score 8
05:48 Thank you
--
Full question
https://stackoverflow.com/questions/5088...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python
#avk47
ACCEPTED ANSWER
Score 8
What you're seeing is an optimization in the compiler in CPython (which compiles your source code into the bytecode that the interpreter runs). Whenever the same immutable constant value is used in several different places within the a chunk of code that is being compiled in one step, the compiler will try to use a reference to same object for each place.
So if you do multiple assignments on the same line in an interactive session, you'll get two references to the same object, but you won't if you use two separate lines:
>>> x = 257; y = 257  # multiple statements on the same line are compiled in one step
>>> print(x is y)     # prints True
>>> x = 257
>>> y = 257
>>> print(x is y)     # prints False this time, since the assignments were compiled separately
Another place this optimization comes up is in the body of a function. The whole function body will be compiled together, so any constants used anywhere in the function can be combined, even if they're on separate lines:
def foo():
    x = 257
    y = 257
    return x is y  # this will always return True
While it's interesting to investigate optimizations like this one, you should never rely upon this behavior in your normal code. Different Python interpreters, and even different versions of CPython may do these optimizations differently or not at all. If your code depends on a specific optimization, it may be completely broken for somebody else who tries to run it on their own system.
As an example, the two assignments on the same line I show in my first code block above doesn't result in two references to the same object when I do it in the interactive shell inside Spyder (my preferred IDE). I have no idea why that specific situation doesn't work the same way it does in a conventional interactive shell, but the different behavior is my fault, since my code relies upon implementation-specific behavior.
ANSWER 2
Score 2
After discussion and testing in various versions, the final conclusions can be drawn.
Python will interpret and compile instructions in blocks. Depending on the syntax used, Python version, Operating System, distribution, different results may be achieved depending on what instructions Python takes in one block.
The general rules are:
(from official documentation)
The current implementation keeps an array of integer objects for all integers between -5 and 256
Therefore:
a = 256
id(a)
Out[2]: 1997190544
id(256)
Out[3]: 1997190544 # int actually stored once within Python
a = 257
id(a)
Out[5]: 2365489141456
id(257)
Out[6]: 2365489140880 #literal, temporary. as you see the ids differ
id(257)
Out[7]: 2365489142192 # literal, temporary. as you see it gets a new id everytime
                      # since it is not pre-stored
The part below returns False in Python 3.6.3 |Anaconda custom (64-bit)| (default, Oct 17 2017, 23:26:12) [MSC v.1900 64 bit (AMD64)]
a = 257; a is 257
Out[8]: False
But
a=257; print(a is 257) ; a=258; print(a is 257)
>>>True
>>>False
As it is evident, whatever Python takes in "one block" is non deterministic and can be swayed depending on how it is written, single line or not, as well as the version, operating system and distribution used.
ANSWER 3
Score 2
Generally speaking, numbers outside the range -5 to 256 will not necessarily have the optimization applied to numbers within that range. However, Python is free to apply other optimizations as appropriate. In your cause, you're seeing that the same literal value used multiple times on one line is stored in a single memory location no matter how many times it's used on that line. Here are some other examples of this behavior:
>>> s = 'a'; s is 'a'
True
>>> s = 'asdfghjklzxcvbnmsdhasjkdhskdja'; s is 'asdfghjklzxcvbnmsdhasjkdhskdja'
True
>>> x = 3.14159; x is 3.14159
True
>>> t = 'a' + 'b'; t is 'a' + 'b'
True
>>> 
ANSWER 4
Score 2
From python2 docs:
The operators is and is not test for object identity: x is y is true if and only if x and y are the same object. x is not y yields the inverse truth value. [6]
From python3 docs:
The operators is and is not test for object identity: x is y is true if and only if x and y are the same object. Object identity is determined using the id() function. x is not y yields the inverse truth value. [4]
So basically the key to understand those tests you've run on the repl console is by using
accordingly the id() function, here's an example that will show you what's going on behind the curtains:
>>> a=256
>>> id(a);id(256);a is 256
2012996640
2012996640
True
>>> a=257
>>> id(a);id(257);a is 257
36163472
36162032
False
>>> a=257;id(a);id(257);a is 257
36162496
36162496
True
>>> a=12345;id(a);id(12345);a is 12345
36162240
36162240
True
That said, usually a good way to understand what's going on behind the curtains with these type of snippets is by using either dis.dis or dis.disco, let's take a look for instance what this snippet would look like:
import dis
import textwrap
dis.disco(compile(textwrap.dedent("""\
    a=256
    a is 256
    a=257
    a is 257
    a=257;a is 257
    a=12345;a is 12345\
"""), '', 'exec'))
the output would be:
  1           0 LOAD_CONST               0 (256)
              2 STORE_NAME               0 (a)
  2           4 LOAD_NAME                0 (a)
              6 LOAD_CONST               0 (256)
              8 COMPARE_OP               8 (is)
             10 POP_TOP
  3          12 LOAD_CONST               1 (257)
             14 STORE_NAME               0 (a)
  4          16 LOAD_NAME                0 (a)
             18 LOAD_CONST               1 (257)
             20 COMPARE_OP               8 (is)
             22 POP_TOP
  5          24 LOAD_CONST               1 (257)
             26 STORE_NAME               0 (a)
             28 LOAD_NAME                0 (a)
             30 LOAD_CONST               1 (257)
             32 COMPARE_OP               8 (is)
             34 POP_TOP
  6          36 LOAD_CONST               2 (12345)
             38 STORE_NAME               0 (a)
             40 LOAD_NAME                0 (a)
             42 LOAD_CONST               2 (12345)
             44 COMPARE_OP               8 (is)
             46 POP_TOP
             48 LOAD_CONST               3 (None)
             50 RETURN_VALUE
As we can see in this case the asm output doesn't tell us very much, we can see than lines 3-4 are basically the "same" instructions than line 5. So my recommendation would be once again to use id() smartly so you'll know what's is will compare. In case you want to know exactly the type of optimizations cpython is doing I'm afraid you'd need to dig out in its source code