How Is `operator new` Implemented At The Linker Level?

160 Views Asked by At

Per the C++ documentation, the operator new/etc. functions are:

implicitly declared in each translation unit[, and t]he program is ill-formed [...] if more than one replacement is provided in the program for any of the replaceable allocation function[s]".

That is, if the programmer does not define their own operator new, then the C++ standard library must provide it. Conversely, if they do provide it, then the standard library mustn't.

I understand this behavior, but what I don't understand is how this works at a linker level.

The C++ standard library is compiled before the program that uses it, which is where the programmer is deciding whether to define their own operator new. Thus, the C++ standard library does need to speculatively provide an operator new somewhere. But then, when the user's program is linked against the standard library, if they did define their own allocation function, then that would result in multiple definitions and run afoul of the documentation above (in practice, you get a linker error about duplicate symbols, thank goodness).

Conversely, if the standard library doesn't define the operator new, then of course the user would have to do so, or else any call to it will result in missing symbols (and so a linker error the other way).

3

There are 3 best solutions below

0
Quimby On

At least for ELF format, you can define symbols as weak using __attribute__((weak)).

If there are both weak and a normal symbols during linking, the normal symbol takes priority. So weak symbols can act as defaults or fallbacks.

The standard library can define operator new as weak, giving the userspace application an opportunity to override it but if it does not, the library's weak symbol is used, thus avoiding both the undefined reference and multiple definitions errors.

The same mechanism works with system calls usually, quite handy for mocking during unit tests.

0
Ted Lyngmo On

I can't refer to anything in the standard because I don't think there is anything in there about how it should be done.

Here's however an idea of how it could be done:

When linking, references to declared but undefined functions are put in a list of unresolved symbols until the next object file / library is parsed.

Since new is implicitly declared in all TUs, that's what'll happen if a reference to new is encountered before its definition during linking.

Once a definition is found, that's the one it'll use for all those unresolved references - and the one in the standard library will be ignored. So there will be two definitions: One user defined and one in the standard library, but this is an exception from the one definition rule and the standard merely says what the outcome should be. The word used is replaces:

C++23: 6.7.5.5.1 General [basic.stc.dynamic.general]:

  1. The library provides default definitions for the global allocation and deallocation functions. Some global allocation and deallocation functions are replaceable (17.6.3); these are attached to the global module (10.1). A C++ program shall provide at most one definition of a replaceable allocation or deallocation function. Any such function definition replaces the default version provided in the library (16.4.5.6).
0
Mike Kinghan On

I understand this behavior, but what I don't understand is how this works at a linker level.

"Knocking out"

If you ask anything about "a linker", an answer needs the prefatory caution that linkers are not regulated. Nor are the binary file formats that linkers deal in - ELF, PE, COFF, etc. There are no formal standards for linkage comparable to the ISO programming language standards; no such thing as a "conforming" linker. Language implementers just have to get the compilers they create to emit object code in a target format in such a way such that the language definition can be satisfied within the MOes of the system linkers that link that format, or at least the ones the compilers support1.

That being said, the linker behavior you want to understand is basic and customary. The way in which a linker enables the C++ standard library's overloads of operator new to be replaced by user-defined overloads is the same way in which it enables any symbol (a function or something else) that is defined in a default libary (the C++ standard library or some other one) that you link into a program to be redefined with another preferred definition that you supply elsewhere in the linkage. This is a commonplace practice, often described "knocking out" or "overriding" a library definition with a user definition. (I eschew "overriding" because knocking out has nothing whatever to do with the C++ keyword override.)

In linkage, knocking out is as old as libraries and only got beyond very simple explanation when dynamic libraries and dynamic linkage came on the scene. It's reasonable to abstract what may be called the Usual Logic of symbol resolution in program linkage for contempory PC/server OSes and show how it enables knocking out. I will do that next, then as extra reading for fellow Linux/GCC detail buffs I'll demonstrate knocking out with the Linux GCC toolchain for the case in point, the C++ function void* operator new(std::size_t), which implements the throwing scalar overload of operator new. I'll demonstrate cases for both routine dynamic linkage and static linkage.

Best get some slippery terminology clear

This is partly because the clarity is just needed; partly to clarify what distinctions are being set aside when I explicitly set some of them aside for a while. My distinct definitions of (un)resolved and (un)defined may depart from preconceptions.

  • static linker: The linker we've just been calling "the linker" so far - the buildtime linker. It slices up object files and knits together the pieces to make a linked binary file (program or dynamic library) targ as completely as possible with those resources. But the object files are typically insufficient to finish the job at buildtime. Usually dynamic libraries will also required at runtime. The static linker has to be informed about them, but it can't slice and knit them together with targ. Dynamic libraries, by their purpose and design, can't be linked until runtime. What the static linker can do with them is drop notes into targ that will suffice for the dynamic linker to finish the linkage in the context of starting a program that requires targ. When we loosely say that a dynamic library "is linked" at buildtime, we just mean that it's input to the linkage and the static linker drops those notes in targ.

  • dynamic linker: This linker comes into play when the OS wants to run a program prog and the notes that the static linker has dropped therein reveal that it depends on dynamic libraries. The dynamic linker loads those dynamic dependencies, then their dynamic dependencies recursively. It knits them altogether with prog to compose a complete runnable program, which is then allowed to run.

  • static linkage: This is done by the static linker, but it's not the only or default sort of linkage the static linker does. It's a linkage that bans dynamic libraries, so any libraries it needs must be static ones. Static linkage has to be requested by an explicit linker option. You'll request it only when you want targ to have no dynamic dependencies and are prepared to pay the price of a bloated targ.

  • dynamic linkage: Ambiguous! When we're talking about the dynamic linker, dynamic linkage is just what it does. When we're talking about the static linker, a dynamic linkage is a linkage that allows dynamic libaries. Dynamic linkage is the default for the static linker!

  • dynamically linked (program|library): A program or library that has been linked by the dynamic linker? No! It's program or library that's the product of a dynamic linkage by the static linker. So it is program or library that can be linked by the dynamic linker.

  • static library: A library that has been statically linked? No! A static library isn't a product of linkage at all. It's just a package of object files put together by a file-archiving tool, with a symbol-to-object file look-up table packaged in along with them like another file.

  • (external|public|global) symbol: A symbol that the compiler marks up to be visible at least to the static linker.

  • dynamic symbol: An external symbol that the compiler marks up to be visible to the dynamic linker as well.

This one matters a lot -

  • (un)defined symbol: A symbol sym is defined at a point in the (buildtime or runtime) linkage of targ if the linker has bound all uses of sym to the same runtime meaning. Otherwise sym is undefined2. A definition of sym exists in the program if and only if sym is defined.

And so does this one -

  • (un)resolved symbol: A symbol sym is resolved at a point in the (buildtime or runtime) linkage of targ if the linker has determined that sym can be defined. Otherwise sym is unresolved. As long as sym is unresolved the linker keeps searching for a definition. Once sym is resolved the linker stops searching. That a symbol is defined implies that it is resolved, but not conversely.

  • (un)resolved reference of a symbol: A reference of a symbol sym is a use of sym in code, so a reference of sym is resolved, or not, accordingly as sym is resolved or not3.

A simplifying abstraction of symbol resolution of program linkage.

We're abstracting upon the default behaviour of actual linkers for routine linkages.

We can get away with viewing symbol resolution in program linkage as a single iterative process spanning buildtime linkage and runtime linkage.

The output of this process, if it succeeds, is a program prog in which all referenced symbols are defined. The inputs of the process are a list of files:

  • an initial non-empty part whose members may be object files, static libraries or dynamic libraries, in any mixture of those types; then
  • a possibly empty part consisting only of dynamic libraries: all the recursive dynamic dependencies of the dynamic libaries in the initial part, in the order that the dynamic linker would load them.

We can set aside the following distinctions for the time being:

  • static (linkage|linker) v. runtime (linkage|linker). We'll just talk about the linker and linkage.
  • external symbol v. dynamic symbol. Since these are the only kinds of symbol visible to linkage, we'll just call them all symbols4.

Now for the Usual Logic of symbol resolution in our abstraction.

  • To produce prog, the linker consumes the input files in supplied order. Let file be one of those. file contains symbol information visible to the linker that identifies the symbols for which it provides a definition and the symbols it references but does not define. After file is consumed, updated state-of-the linkage information is carried forward, sufficient for the linker's next needs. This includes a ToDo list of symbols sym that are referenced by not yet defined, specifying:

    • Whether sym is resolved.
    • If resolved, the identify of a dynamic libary that provides the first definition of sym that the linker found.

    For each file:

  • If file is an object file:

    • file is linked unconditionally into prog, bringing with it 0 or more fresh undefined symbol references and 0 or more fresh symbol definitions.
    • The linkage of file introduces into prog all of the undefined symbol references and definitions from file. The linker can't pick and choose among them5.
    • If a definition of a ToDo symbol sym is found with these additions then sym becomes resolved and defined by that definition, whether or not it was already resolved.
  • If file is an static library: It a package of object files: its symbol information is their symbol information.

    • For each ToDo symbol sym, the package is searched for an object file that provides a definition of sym.
    • The first object file found, if any, that provides a definition is extracted from the package and input to the linkage just like an object file input explicitly. Thus sym is resolved and defined.
    • The package is then iteratively searched for further symbol resolutions until it yields none.
    • Object files within the package that do not provide any needed definitions might as well not exist.
  • If file is a dynamic library:

    • For each ToDo symbol sym, file's symbol information is searched for sym.
    • If sym appears there as defined, it becomes resolved but not defined. It is carried forward in the ToDo list as resolved, along with the identity of file.
  • If at any point object files are linked into prog that would cause it to contain multiple definitions of the same symbol, a multiple definition linkage error results, and does not result otherwise6.

  • At end of linkage:

    • If the ToDo list contains any unresolved symbols, an unresolved symbol error results7.
    • Otherwise, each sym remaining in the ToDo list has been resolved by a definition provided by a known dynamic library lib: sym then becomes defined by lib's definition8.

Note that per the Usual Logic, it is possible for sym to resolved at most twice, and defined at most once. sym will be resolved twice if it is first resolved, but not defined, with a definition furnished by a dynamic library, and later in the linkage is resolved again, and defined, by a definition from a linked object file (which might have been extracted from a static library).

Four variations of knocking out

From the Usual Logic as described, it follows that:

  1. Where a definition of sym is provided by an explicitly linked object file obj, obj is uncondionally linked, sym is resolved and defined.

    1. Any later provider that is a library, static or dynamic, is knocked out - because the linker will not even search it for sym.
    2. Any preceding provider that is a dynamic library is knocked out - because it can at most resolve sym, not define it.
  2. Where the first provider of sym is a static libary member arc(obj), obj is extracted and linked. Then 1.1 kicks in.

  3. Where the first provider of sym is a shared library, sym is resolved, but not defined as long as there are more input files. Any later provider that is a library, static or dynamic, is knocked out - because the linker will not even search it for sym.


Extra: Linux GCC examples of knocking out operator new(std::size_t)

From now on the distinctions of terminology that I set aside early are back in force, and I call a dynamic library a shared library, in GNU/Linux usage (a.k.a Dynamic Shared Object, DSO).

I'm using:

$ g++ --version
g++ (Ubuntu 13.2.0-4ubuntu3) 13.2.0

The system static linker is

$ ld --version
GNU ld (GNU Binutils for Ubuntu) 2.41

And the system dynamic linker is:

$ ld.so --version
ld.so (Ubuntu GLIBC 2.38-1ubuntu6.1) stable release version 2.38.

Some source files:

main.cpp

Calls operator new(std::side_t) and prints 42 on success.

#include <iostream>

// extern void * operator new(std::size_t); - implicitly declared

int main(void){
    int *pi = new int(42);
    std::cout << *pi << std::endl;
    delete pi; 
    return 0;
}

new_v1.cpp

A knock-out implementation of operator new(std::side_t).

#include <cstdio>
#include <cstdlib>
#include <new>

void* operator new(std::size_t sz)
{
    std::printf("operator new v1: new(size_t), size = %zu\n", sz);
    if (sz == 0) {
        ++sz;
    }
    if (void *ptr = std::malloc(sz)) {
        return ptr;
    }
    throw std::bad_alloc{};
}

new_v2.cpp

Another knock-out implementation of operator new(std::side_t).

#include <cstdio>
#include <cstdlib>
#include <new>

void* operator new(std::size_t sz)
{
    std::printf("operator new v2: new(size_t), size = %zu\n", sz);
    if (sz == 0) {
        ++sz;
    }
    if (void *ptr = std::malloc(sz)) {
        return ptr;
    }
    throw std::bad_alloc{};
}

The sole difference between new_v1.cpp and new_v2.cpp is that the the former defines operator new(std::size_t sz) so as to log its invocation on the console as operator new v1, whereas the latter will log operator new v2.

Prepare some linkage resources

Regular object files compiled from main.cpp, new_v1.cpp and new_v2.cpp:

$ g++ -c -o main.o -Wall -Wextra -pedantic main.cpp
$ g++ -c -o new_v1.o -Wall -Wextra -pedantic new_v1.cpp
$ g++ -c -o new_v2.o -Wall -Wextra -pedantic new_v2.cpp

Relocatable object files for linkage in shared libaries, compiled from new_v1.cpp and new_v2.cpp:

$ g++ -c -fPIC -o new_v1_r.o -Wall -Wextra -pedantic new_v1.cpp
$ g++ -c -fPIC -o new_v2_r.o -Wall -Wextra -pedantic new_v2.cpp

Shared librares linked respectively from new_v1_r.o and new_v2_r.o:

$ g++ -shared -o libnew_v1.so new_v1_r.o
$ g++ -shared -o libnew_v2.so new_v2_r.o

Finally a static library containing new_v1.o and new_v2.o:

$ ar rcs libnews.a new_v1.o new_v2.o  # `ar` is the GNU archiver

The object files are packaged in the order inserted:

$ ar t libnew.a
new_v1.o
new_v2.o

and operator new(std::size_t) is defined in each of them:

$ nm --defined-only -C libnew.a | grep 'operator new'
0000000000000000 T operator new(unsigned long)
0000000000000000 T operator new(unsigned long)

How is operator new(std::size_t) identified to the linker?

I need to know the mangled function name to which the C++ compiler translates operator new(std::size_t) for the linker, because I want to trace it in linkages.

Here are the raw undefined symbols in main.o:

$ nm -u main.o
                 U _GLOBAL_OFFSET_TABLE_
                 U _ZdlPvm
                 U _ZNSolsEi
                 U _ZNSolsEPFRSoS_E
                 U _Znwm
                 U _ZSt21ios_base_library_initv
                 U _ZSt4cout
                 U _ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_
                 

Nothing there is obviously our operator, so let's look again with demangling turned on:

$ nm -uC main.o
                 U _GLOBAL_OFFSET_TABLE_
                 U operator delete(void*, unsigned long)
                 U std::ostream::operator<<(int)
                 U std::ostream::operator<<(std::ostream& (*)(std::ostream&))
                 U operator new(unsigned long)
                 U std::ios_base_library_init()
                 U std::cout
                 U std::basic_ostream<char, std::char_traits<char> >& std::endl<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&)

operator new(unsigned long) is our function, mangled name is _Znwm.

Baseline static and dynamic linkages, no knock-outs

We'll link a program prog using main.o that invokes the default operator new from libstdc++. We'll ask the linker to -trace-symbol=_Znwm so we can see where it's referenced and resolved. Link it dynamically per default, then again statically:

$ g++ -o prog main.o -Wl,-trace-symbol=_Znwm    # dynamic linkage
/usr/bin/ld: main.o: reference to _Znwm  # g++ ultimately delegates linkage to `ld`
/usr/bin/ld: /usr/lib/gcc/x86_64-linux-gnu/13/libstdc++.so: definition of _Znwm ...
...[cut]...

_Znwm is referenced but not defined in main.o, as we know. A definition was provided libstdc++.so. But, let's check it out in in the symbol tables of prog:

$ readelf --wide --syms prog
    Symbol table '.dynsym' contains 13 entries:
       Num:    Value          Size Type    Bind   Vis      Ndx Name
         ...[cut]...
         3: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND _Znwm@GLIBCXX_3.4 (3)
         ...[cut]...

    Symbol table '.symtab' contains 45 entries:
       Num:    Value          Size Type    Bind   Vis      Ndx Name
         ...[cut]...
        31: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND _Znwm@GLIBCXX_3.4
         ...[cut]...
     

_Znwm is undefined (Ndx = UND) in both the dynamic symbol .dynsym and the global symbol table .symtab, although it was resolved by a definition found in shared library libstdc++.so identified by _Znwm@GLIBCXX_3.4. (The suffix @GLIBCXX_3.4 is a dynamic symbol versioning suffix that the linker parses.)

Since _Znwm was resolved but not defined, it could be defined in a different linkage even after libstdc++.so was consumed, knocking out the default definition "retrospectively". But as things are, the default definition will be bound at runtime by ld.so, resulting in:

$ ./prog
42

We can view the notes that ld has dropped in prog so that ld.so knew what its immediate dynamic dependencies are. They are the dynamic section in prog:

$ readelf -d prog | egrep \(Dynamic\|Tag\|NEEDED\)
Dynamic section at offset 0x2d98 contains 28 entries:
  Tag        Type                         Name/Value
 0x0000000000000001 (NEEDED)             Shared library: [libstdc++.so.6]
 0x0000000000000001 (NEEDED)             Shared library: [libc.so.6]
 

And we can see the recursively exhaustive list of dynamic dependencies that ld.so arrives at:

$ ld.so --list ./prog
    linux-vdso.so.1 (0x00007fff9d743000)
    libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f49d5800000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f49d5400000)
    libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f49d5715000)
    /lib64/ld-linux-x86-64.so.2 => ld.so (0x00007f49d5ac2000)
    libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f49d5a87000)
    

(The ld.so --list ./prog command is normally invoked via ldd ./prog).

Next, the static linkage:

$ g++ -static -o prog main.o -Wl,-trace-symbol=_Znwm    # static linkage
/usr/bin/ld: main.o: reference to _Znwm
/usr/bin/ld: /usr/lib/gcc/x86_64-linux-gnu/13/libstdc++.a(new_op.o): definition of _Znwm
...[cut]...

This time, the provider of _Znwm is the archive member libstdc++.a(new_op.o). Here's the global symbol table of prog (a statically program has no dynamic symbol table), filtered for _Znwm:

$ readelf --wide --syms prog | egrep \(Symbol\|Ndx\|GLOBAL.*_Znwm\)
Symbol table '.symtab' contains 7447 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
  2593: 0000000000405340    53 FUNC    GLOBAL DEFAULT    7 _Znwm
 

This time _Znwm is defined (Ndx = a number), because it's provided by a linked object file. But the default definition could still be knocked out by one from another object file linked before libstdc++.a was reached. The program behaves just the same:

$ ./prog
42

Now I'll do linkages that demonstrate each of the four knock-out variations, making two of them static linkages and two of them dynamic. I'll also make one of them illustrate knockout chaining.

Knock-out type 1.1: (static linkage)

The first provider of _Znwm is an object file new_v1.o that provides definition V1. The default definition provided later by libstdc++.a is knocked out.

$ g++ -static -o prog main.o new_v1.o -Wl,-trace-symbol=_Znwm
/usr/bin/ld: main.o: reference to _Znwm
/usr/bin/ld: new_v1.o: definition of _Znwm
/usr/bin/ld: /usr/lib/gcc/x86_64-linux-gnu/13/libstdc++.a(numeric_members.o): reference to _Znwm
...[cut]...

Proof:

$ ./prog
operator new v1: new(size_t), size = 4
42

libstc++.a was not searched for _Znwm because it was already defined, but many object files were linked from libstdc++.a to define other symbols for the program, plenty of them containing references to _Znwm. The first such archive member - listed - was libstdc++.a(numeric_members.o)

Knock-out type 1.2: (dynamic linkage)

The first provider of _Znwm is the shared standard library libstdc++.so But an explicitly linked object file new_v1.o provides definition V1 later. The preceding default definition from libstdc++.so is knocked out.

$ g++ -o prog main.o -l stdc++ new_v2.o -Wl,-trace-symbol=_Znwm
/usr/bin/ld: main.o: reference to _Znwm
/usr/bin/ld: /usr/lib/gcc/x86_64-linux-gnu/13/libstdc++.so: definition of _Znwm
/usr/bin/ld: new_v2.o: definition of _Znwm

Proof:

$ ./prog
operator new v2: new(size_t), size = 4
42

Knock-out type 2: (static linkage)

The first provider of _Znwm is an archive member libnews.a(new_v1.o) that provides definition V1. The default definition provided later by libstdc++.a is knocked out.

$ g++ -static -o prog main.o -L . -l news  -Wl,-trace-symbol=_Znwm
/usr/bin/ld: main.o: reference to _Znwm
/usr/bin/ld: ./libnews.a(new_v1.o): definition of _Znwm
/usr/bin/ld: /usr/lib/gcc/x86_64-linux-gnu/13/libstdc++.a(monetary_members.o): reference to _Znwm
...[cut]...

Proof:

$ ./prog
operator new v1: new(size_t), size = 4
42

The difference between this example and example 1.1 is merely that the linker linked archive member libnews.a(new_v1.o) because it was searching for a definition when libnews.a was reached, not because it was obliged to link an explicit object file.

The V2 definition provided by libnews.a(new_v2.o) might as well not exist.

Knock-out type 3 with knockout chaining: (dynamic linkage)

For this one we'll throw more than the minimum into the linkage, (a) to stress the V2 definition provided first knocks out all later library definitions - user-defined or default, whether in a static or shared library, (b) to illustrate knockout chaining.

The first provider of _Znwm is a shared library libnew_v2,so. All later library definitions are knocked out:

  • A later V1 definition provided by shared library libnew_v1.so
  • The standard library definition provided later by static library libstdc++.a
  • The standard library definition provided later still by shared library libstdc++.so, which is linked after all the rest by default.

But also, for other symbols, the standard definitions provided by libstdc++.a knock out the standard definitions provided later by libstdc++.so.

$ g++ -o prog main.o -L . -l new_v2 -l new_v1 -l:libstdc++.a -Wl,-rpath=$(pwd),-trace-symbol=_Znwm
/usr/bin/ld: main.o: reference to _Znwm
/usr/bin/ld: ./libnew_v2.so: definition of _Znwm
/usr/bin/ld: ./libnew_v1.so: reference to _Znwm
/usr/bin/ld: /usr/lib/gcc/x86_64-linux-gnu/13/libstdc++.a(monetary_members.o): reference to _Znwm
...[cut]...

From:

$ ./prog
operator new v2: new(size_t), size = 4
42

it's clear that the -l new_v2 definition knocked out the -l new_v1 definition, the -l:libstdc++.a, and the definition from the libstdc++.so.

But the static standard library libstdc++.a then knocked out everything else from the later implicit libstdc++.so. This chained knock-out has no visible runtime affect on the behaviour of our particular program. But we can detect it by filtering for some of the other standard overloads of operator new in the (demangled) symbol tables of prog:

$ readelf -W --syms prog | egrep \(\Symbol\|GLOBAL.*_Zn\|Ndx\) | c++filt
Symbol table '.dynsym' contains 4061 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
    34: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND operator new(unsigned long)
  1158: 00000000000fa530    30 FUNC    GLOBAL DEFAULT   16 operator new[](unsigned long, std::nothrow_t const&)
  3558: 00000000000fa520     9 FUNC    GLOBAL DEFAULT   16 operator new[](unsigned long)
Symbol table '.symtab' contains 5120 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
  1730: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND operator new(unsigned long)
  4346: 00000000000fa520     9 FUNC    GLOBAL DEFAULT   16 operator new[](unsigned long)
  4389: 00000000000fa530    30 FUNC    GLOBAL DEFAULT   16 operator new[](unsigned long, std::nothrow_t const&)
  

Our own V2 operator new(unsigned long) appears as undefined ( Ndx = UND), because it was resolved in shared library libnew_v2.so. But its two operator new[] colleagues are defined (Ndx= a number). That's because they were defined in object files linked from libstdc++.a, not resolved by libstdc++.so.

Let's cause a multiple definition error to finish!

$ g++ -o prog main.o new_v2.o new_v1.o
/usr/bin/ld: new_v1.o: in function `operator new(unsigned long)':
new_v1.cpp:(.text+0x0): multiple definition of `operator new(unsigned long)'; new_v2.o:new_v2.cpp:(.text+0x0): first defined here
collect2: error: ld returned 1 exit status

or alternatively:

$ g++ -o prog main.o -l:libstdc++.a new_v1.o
/usr/bin/ld: new_v1.o: in function `operator new(unsigned long)':
new_v1.cpp:(.text+0x0): multiple definition of `operator new(unsigned long)'; /usr/lib/gcc/x86_64-linux-gnu/13/libstdc++.a(new_op.o):(.text._Znwm+0x0): first defined here
collect2: error: ld returned 1 exit status

The only way to cause a multiple definition error at link time (as opposed to compiletime) is to require the linker to link two object files into the program that both provide definitions of the same symbol: in the first linkage, new_v2.o + new_v1.o; in the second libstdc++.a(new_op.o) + new_v1.o. The only definitions that get into the program at buildtime are the ones that get in from object files. The ones finally resolved in shared libraries at buildtime are just the ones that aren't knocked out, and they don't get in until runtime.


  1. Of course, compiler development exerts an influence on linker development, but since a linker is blind to source languages and purposed to serve any and all compilers in a system, the influence is in the realm of nudge.

  2. If sym is a data symbol its assigned meaning is a data value. If sym is a function symbol then its assigned meaning is the execution of a particular code sequence (the compiler having arranged that any arguments required per the function signature are correctly set up on entry).

  3. A mere mention of sym (in compiler terminology, a declaration) isn't a use: a use evaluates the meaning sym

  4. We're ignoring weak symbols, which are an ELF-specific wrinkle not relevant here.

  5. Say, by taking just the definitions that will shrink the ToDo list: that way, the ToDo list would never get populated at all.

  6. Compiletime multiple definition errors are also possible, but we don't get to linkage until they're fixed.

  7. Different linkers may diagnose an unresolved symbol error in different words. The Microsoft linker reports "unresolved external symbol sym". The GNU Linux linker reports "undefined reference to sym".

  8. So what happens if a dynamic library libB both references sym and provides a definition, but sym becomes defined in a linkage by an early dynamic library whose definition is found first: does libA's definition knock out libB's internally as well as externally? Are libBs own references of sym bound to libA's definition or its own? It depends on the tool chain. For the GNU/Linux ELF linker, libA's definition wins by default; but the prior linkage of libB itself could be varied to spurn any external definition for its own references. For a Microsoft PE binary the compiler preempts the choice: a definition of sym can be dynamically exported or dynamically imported but not both; so if libB exports its own definition it cannot be knocked out internally.