Recently I'm learning Hotspot JVM. When learning the string constant pool and String intern function, I encountered a very weird situation. After browsing a lot of answers, I still can’t explain this phenomenon, so I’m sending it out to discuss with you.
public static void main(String[] args) {
String s1 = new String("12") + new String("21");
s1.intern();
String s2 = "1221";
System.out.println(s1 == s2); // true
}
public static void main(String[] args) {
String s1 = new String("12") + new String("21");
// s1.intern();
String s2 = "1221";
System.out.println(s1 == s2); // false
}
The reslut is based on Java8.
So the only difference between the two codes is call s1.intern() or not.
Here is the document of intern function.
When the intern method is invoked, if the pool already contains a string equal to this String object as determined by the equals(Object) method, then the string from the pool is returned. Otherwise, this String object is added to the pool and a reference to this String object is returned.
Here is my understanding:
- By browsing the bytecode file, we can find "12", "21", "1221" in the constant pool.
- When the class is loaded, the constant pool in bytecode file is loaded into run-time constant pool. So the String pool contains "12", "21", "1221".
- new String("12") create a String instance on the heap, which is different from "12" in String pool. So does new String("21").
- The "+" operator is transformed into StringBuilder and call its append and toString method, which can be seen in bytecode.
- In toString method calls new string, so s1 is String instance "1221" on the heap.
- s1.intern() look into String pool, and a "1221" is there, so it dose nothing. Btw, we don't use the return value, so it has nothing to do with s1.
- String s2 = "1221" just loaded the "1221" instance in the string pool. In bytecode, ldc #11, #11 is the index of "1221" in constant pool.
- The "==" operator comapre the address of reference type. The s1 point to the instance on the heap, the s2 point to the instance in the string pool. How can these two be equal?
My wonder:
- What exactly do s1 and s2 point to?
- Why call intern() methed will change the behavior? Even don't use the return value.
Here is my assumption:
The string pool is not initilized when class is loaded. Some answer said s1.intern() is the first time "1221" is loaded into string pool. But how to explain "1221" is in the constant pool of bytecode file. Is there any specification about string pool loading timing?
Another saying is intern function just save the reference to the instance on the heap, but the renference s1, s2 are still different. s1 point the heap, s2 point to the string pool, and string pool point to the heap. The reference is different from reference of a reference.
I am the questioner.
Thanks for the discussion with @Sweeper and @user16320675, I have new understanding of this problem, and I share it with you here.
The error occurred in understanding 2 and 6, the string pool was not loaded along with the class loading.
s1.intern()is the first time adds "1221" to the string pool. And thenString s2 = "1221"will change the behavior according to whether "1221" exists in the string pool.In order to better explain this problem, first define the key concepts involved.
key concept
wrong reason
The error comes from misunderstanding the relationship between the
string pooland theconstant pool(hereafter using the constant pool and the runtime constant pool indiscriminately).Although it is usually called
string constant pool, it does not have a relationship with theconstant pool. Therefore, it will not be loaded as the class is loaded. In JDK6, both thestring pooland theconstant poolare located in the permanent generation, and there seems to be some relationship between them. But in JDK8, thestring poolwas moved to the heap. It is not so much part of theconstant poolas it is part of theString class. It can be understood as a private member variable of theString class, although it cannot be observed in theStringsource code.After the
String instancein thestring poolis created, the byte array in theinstancecannot be changed. If a change operation is performed on an existingString instance, a newString instancewill be generated, showing the characteristics of a constant, so it is usually called astring constant pool. But in order to avoid confusing thestring pooland theconstant pool, I tries to use thestring poolinstead of the string constant pool.Another concept that is easily confused with it is
CONSTANT_String_infoin theconstant pool.String literalsare stored in Unicode sequences, and will be loaded into theruntime constant poolalong with class loading. But it is fundamentally different from thestring pool:CONSTANT_String_infoonly storesUnicode sequences, while the string pool storesString instances.String instancesnot only containUnicode sequences, but also other member attributes, such as hash. And theString classis bound with many methods which cannot be executed onCONSTANT_String_info. The correspondingString instancecan be generated by executing the String initialization function with the Unicode sequence inCONSTANT_String_infoas a parameter.