Nested generic syntax ambiguity >>

926 Views Asked by At

Apparently, C# is as susceptible to '>>' lexer dilemma as is C++.

This C# code is pretty valid, it compiles and runs just fine:

var List = new Dummy("List");
var Nullable = new Dummy("Nullable");
var Guid = new Dummy("Guid");

var x = List<Nullable<Guid>> 10;
var y =  List<Nullable<Guid>> .Equals(10,20);

You'd have to overload '<' and '>>' operators for the Dummy class above.

But the compiler manages to guess that in 'x' case the meaning is to use List, Nullable and Guid local variables. And in 'y' case it suddenly decides to treat them as names of well-known types.

Here's a bit more detailed description with another example: http://mihailik.blogspot.co.uk/2012/05/nested-generics-c-can-be-stinky.html

The question is: how does C# compiler resolve 'a<b<c>>' to arithmetic expression or generic type/method?

Surely it doesn't try to have multiple 'goes' over the text of the program until it succeeds, or does it? That would require unbounded look-ahead, and a very complex too.

2

There are 2 best solutions below

0
Oleg Mihailik On BEST ANSWER

I've been directed to the paragraph 7.6.4.2 in C# language spec:

http://download.microsoft.com/download/0/B/D/0BDA894F-2CCD-4C2C-B5A7-4EB1171962E5/CSharp%20Language%20Specification.htm

The productions for simple-name (§7.6.2) and member-access (§7.6.4) can give rise to ambiguities in the grammar for expressions.

...

If a sequence of tokens can be parsed (in context) as a simple-name (§7.6.2), member-access (§7.6.4), or pointer-member-access (§18.5.2) ending with a type-argument-list (§4.4.1), the token immediately following the closing > token is examined. If it is one of

( ) ] } : ; , . ? == != | ^

then the type-argument-list is retained as part of the simple-name, member-access or pointer-member-access and any other possible parse of the sequence of tokens is discarded. Otherwise, the type-argument-list is not considered to be part of the simple-name, member-access or pointer-member-access, even if there is no other possible parse of the sequence of tokens. Note that these rules are not applied when parsing a type-argument-list in a namespace-or-type-name (§3.8).

So, there may indeed an ambiguity arise when type-argument-list is involved, and they've got a cheap way to resolve it, by looking one token ahead.

It's still an unbound look ahead, because there might be a megabyte worth of comments between '>>' and following token, but at least the rule is more or less clear. And most importantly there is no need for speculative deep parsing.

11
JotaBe On

EDIT: I insist there's no ambiguity: In your example there's no ambiguity at all. This could never be evaluated as a List<Guid?>. The context (the extra 10) shows the compiler how to interpret it.

var x = List<Nullable<Guid>> 10;

Would the compiler compile this?:

var x = List<Guid?> 10;

It's clear it wouldn't. So Im'm still looking for the ambiguity.

OTOH, the second expression:

var y =  List<Nullable<Guid>> .Equals(10,20);

has to be evaluated as a List<Guid?>, because you're invoking the .Equals method. Again, this can be interpreted in any other way.

There is no paradox at all. The compiler parses it perfectly. I'm still wondering which the apradox is.

You're on a big mistake. The compiler interprets whole expressions, and uses the language grammar to understand them. It doesn't look at a fragment of code, like you're doing, without taking the rest of the expression into account.

These expressions are parsed according to C# grammar. And the grammar is clear enough to interpret the code correctly. I.e. in

var x = List<Nullable<Guid>> 10;

It's clear that 10 is a literal. If you follow up the grammar you'll find this: 10 is a *literal, so it's *primary-no-array-creation-expression, which is a *primary-expression, which is a *unary-expression, which is a *multiplicative-expression, which is an *additive-expression. If you look for an additive expression on the right side of a *>>, you find it must be a *shift-expression, so the left side of the *>> must be interpreted as an *additive-expression and so on.

If you were able to find a different way to use the grammar and get a different result for the same expression, then, I would have to agree with you, but let me disagree!

Finally:

  • is very confusing for humans
  • is absolutely clear and unambiguous to the compiler

Because:

  • we humans identify patterns taking fragments of the whole text which are familiar to us, like List<Nullable<Guid>> and interpret them like we want
  • compilers doesn't interptret the code like us, taking familiar fragments like List<Nullable<Guid>>. They take the whole expression and match it to the language grammar.