Why isn't `str(1) is '1'` `True` in Python?

604 Views Asked by At

I'm not asking about the difference between == and is operators! I am asking about interning or something..!

In Python 3.9.1,

>>> str(1) is '1'
<stdin>:1: SyntaxWarning: "is" with a literal. Did you mean "=="?
False
>>> '1' is '1'
<stdin>:1: SyntaxWarning: "is" with a literal. Did you mean "=="?
True

I found out that characters which match [a-zA-Z0-9_] are interned in Python. I understand why '1' is '1'. Python stores a character '1' somewhere in the memory internally and refers to it whenever '1' is called. And str(1) returns '1', and I think, it should refers to the same address as other literal '1's. Shouldn't str(1) is '1' also be True?

3

There are 3 best solutions below

12
U13-Forward On BEST ANSWER

is checks for references, not content. Also, str(1) is not a literal therefore it is not interned.

But '1' is interned because it's directly a string. Whereas str(1) goes through a process to become a string. As you can see:

>>> a = '1'
>>> b = str(1)
>>> a
'1'
>>> b
'1'
>>> a is b
False
>>> id(a)
1603954028784
>>> id(b)
1604083776304
>>>

So the way to make them both interned is with sys.intern:

>>> import sys
>>> a = '1'
>>> b = str(1)
>>> a is b
False
>>> a is sys.intern(b)
True
>>> 

As mentioned in the docs:

Enter string in the table of “interned” strings and return the interned string – which is string itself or a copy. Interning strings is useful to gain a little performance on dictionary lookup – if the keys in a dictionary are interned, and the lookup key is interned, the key comparisons (after hashing) can be done by a pointer compare instead of a string compare. Normally, the names used in Python programs are automatically interned, and the dictionaries used to hold module, class or instance attributes have interned keys.

Interned strings are not immortal; you must keep a reference to the return value of intern() around to benefit from it.

Note that in Python 2 intern() was a built-in keyword, but now in python 3 it was merged into the sys module to become sys.intern

0
Rustam Garayev On

Since str() method is interpreted and run at the runtime (not at the compile time), python compiler does not know what str(1) is gonna be equal to. But basic string concatenation or definition of variable runs at the compile time, so the optimisation (string interning) can be made. See the below examples:

>>> a = '12'
>>> b = '1' + '2'
>>> c = str(12)
>>> d = ''.join(['1', '2'])

>>> a is b
True
>>> a is c
False
>>> b is c
False
>>> a is d
False
0
Masklinn On

I found out that characters which match [a-zA-Z0-9_] are interned in Python

You "found out" wrong I fear.

Python automatically interns string literals as well as symbol names (e.g. module names, class names, method names, ...).

And str(1) returns '1', and I think, it should refers to the same address as other literal '1's. Shouldn't str(1) is '1' also be True?

No. Interning is an explicit deduplication operation, and CPython only exposes an API to intern a string, there is not currently any way to intern a string if there's already an interned version without doing messing with the interpreter guts by hand.

This means if the output of str() could be interned (by str itself) it always would be, which is probably undesitable: while CPython does not "leak" interned strings (they don't leave forever), increasing the size of the interning map adds to its maintenance and management costs .