Bending .NET - Compiling 65,536 Programs with Roslyn to Find Valid Identifier Separator char's... then just use `SyntaxFacts.IsValidIdentifier` 🤦‍

or how to find invalid human batteries so they can be flushed.

In this post, part of the Bending .NET series, I look at compiling a complete list of valid and invalid C# identifier separators by using Roslyn. That is, by being lazy and instead of checking unicode specification just try every single character.

sloth Source: wikimedia

Encoding information in file or directory names is a simple way of keeping details about a given file. For example, an image or an ONNX model and the conditions or origins of it. That is, for an ONNX model it might be something simple like the version of the ground truth that it was trained on. Assumming you, of course, version your ground truth. Defining a simple schema for this encoding using simple separators e.g. _, = can make it easy to then parse in a position independent manner like M=Surface_V=1.2.1.onnx. Now these files might allow embedding this information in them in some way, but sometimes it is just easier for reference to have it directly in the file name, for example.

Sometimes it is also nice to be able to embed these assets into .NET assemblies and have them exposed as simple properties. Preferably properties that have the same name as the file. However, C# naturally has more strict rules for identifier names than file names, which brings up the question what chars are valid separators in a C# identifier name?

The above link says:

  • Identifiers must start with a letter or underscore (_).
  • Identifiers may contain Unicode letter characters, decimal digit characters, Unicode connecting characters, Unicode combining characters, or Unicode formatting characters. For more information on Unicode categories, see the Unicode Category Database. You can declare identifiers that match C# keywords by using the @ prefix on the identifier. The @ is not part of the identifier name. For example, @if declares an identifier named if. These verbatim identifiers are primarily for interoperability with identifiers declared in other languages.

Now this is a perfectly fine answer, but I was just too lazy to go scouring through the Unicode Category Database 🦥. What I wanted was a single table of valid identifier separators. I googled for answers but came up short within a reasonable time, so instead I came up with this Program.cs (after some iterations as discussed below):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
using System.Diagnostics;
using System.Runtime.CompilerServices;
using System.Text;
using Microsoft.CodeAnalysis;
using Microsoft.CodeAnalysis.CSharp;

var findByCompile = false;
var encoding = Encoding.UTF8;
Console.OutputEncoding = encoding;
Action<string> log = t => { Console.WriteLine(t); Trace.WriteLine(t); };
// Cache metadata reference since this reduces time significantly
var metadataReferences = new[] { MetadataReference.CreateFromFile(
    typeof(object).Assembly.Location) };

var validSeparators = new List<char>();
var invalidSeparators = new List<char>();

var validFileNameChars = new List<char>();
var invalidFileNameChars = Path.GetInvalidFileNameChars();
Array.Sort(invalidFileNameChars);

var stopwatch = Stopwatch.StartNew();
for (int i = char.MinValue; i <= char.MaxValue;)
{
    var c = (char)i;

    (IsValidSeparator(c) ? validSeparators : invalidSeparators).Add(c);

    if (Array.BinarySearch(invalidFileNameChars, c) < 0)
    { validFileNameChars.Add(c); }

    if (++i % 4096 == 0 && findByCompile) { log($"Compiled {i:D5} programs"); }
}
var elapsed_ms = stopwatch.ElapsedMilliseconds;

Write(validSeparators);
Write(invalidSeparators);
Write(validFileNameChars);
Write(invalidFileNameChars);

var totalCount = validSeparators.Count + invalidSeparators.Count;
log($"Found {validSeparators.Count} valid identifier separator chars.");
log($"Found {invalidSeparators.Count} invalid identifier separator chars.");
log($"Found {validFileNameChars.Count} valid file name chars.");
log($"Checked {totalCount} chars in {elapsed_ms} ms or " +
    $"{elapsed_ms / (double)totalCount:F3} ms per char.");

bool IsValidSeparator(char c) => findByCompile ? DoesCompile(c)
    : SyntaxFacts.IsValidIdentifier(Identifier(c));

bool DoesCompile(char c)
{
    var program = $"var {Identifier(c)} = 42;";
    var syntaxTree = CSharpSyntaxTree.ParseText(program);
    var compilation = CSharpCompilation.Create("assemblyName",
        new[] { syntaxTree }, metadataReferences,
        new CSharpCompilationOptions(OutputKind.ConsoleApplication));
    using var dllStream = new MemoryStream();
    var emitResult = compilation.Emit(dllStream);
    return emitResult.Success;
}

static string Identifier(char c) => $"_{c}_";

void Write(IReadOnlyList<char> chars,
    [CallerArgumentExpression("chars")] string fileName = "")
{
    const string baseDir = "../../../../";
    File.WriteAllText(baseDir + $"{fileName}.csv", ToCsv(chars), encoding);
    File.WriteAllText(baseDir + $"{fileName}.txt", ToTable(chars), encoding);
}

static string ToCsv(IReadOnlyList<char> chars) => string.Join(Environment.NewLine,
    new[] { CsvHeader() }.Concat(chars.Select(c => CsvLine(c))));
static string CsvHeader() => "Decimal,Hex,Identifier";
static string CsvLine(char c) => $"{(int)c:D5},0x{(int)c:X4},{Identifier(c)}";

static string ToTable(IReadOnlyList<char> chars) => string.Join(Environment.NewLine,
    new[] { TableHeader() }.Concat(chars.Select(c => TableLine(c))));
static string TableHeader() => $"|Decimal|Hex|Identifier|{Environment.NewLine}|-:|-:|-|";
static string TableLine(char c) => $"|{(int)c:D5}|`0x{(int)c:X4}`|`{Identifier(c)}`|";

For completeness here also the accompanying ValidIdentifierSeparators.csproj:

1
2
3
4
5
6
7
8
9
10
11
<Project Sdk="Microsoft.NET.Sdk">
  <PropertyGroup>
    <OutputType>Exe</OutputType>
    <TargetFramework>net6.0</TargetFramework>
    <ImplicitUsings>enable</ImplicitUsings>
    <Nullable>enable</Nullable>
  </PropertyGroup>
  <ItemGroup>
    <PackageReference Include="Microsoft.CodeAnalysis.CSharp" Version="4.3.0" />
  </ItemGroup>
</Project>

As can be seen this references Microsoft.CodeAnalysis.CSharp or Roslyn as a nuget package.

The program is fairly self explanatory. As the astute reader might have observed, though, and as it happens with any software development you might start out going down the “wrong” rabbit hole. I certainly did that by first simply defining a full top-level statement single line program like var _{c}_ = 42; where c would be a given character and then use Roslyn to compile that program and check if this would succeed using CSharpCompilation.Emit. Carefully encapsulating the char in underscores _ to handle both the fact the first character is limited and if a char being tested is considered white space, which the final underscore handles.

Hence, I ended up compiling 65.536 programs to check if a given identifier was valid or not. This took about 300 s or about 5 minutes. Faster than scouring through the unicode database I am sure 😅 This seemed a bit slow… a quick profiling session revealed that:

1
MetadataReference.CreateFromFile(typeof(object).Assembly.Location)

took up most of the time. This basically parses and builds a “metadata reference” to the assembly containing object, which is the only assembly needed for the var _{c}_ = 42; program. This was the reason I moved this to the top of the program so it only happened once. This brought the run time down to ~22 s or 14-15x faster:

1
2
3
4
Found 50683 valid identifier separator chars.
Found 14853 invalid identifier separator chars.
Found 65495 valid file name chars.
Checked 65536 chars in 22076 ms or 0.337 ms per char.

Later, I finally found out Roslyn has the very nice SyntaxFacts and the obvious method to use namely IsValidIdentifer. Using this cut down run time to 5 ms or 60,000x faster than the initial program. 😆

1
2
3
4
Found 50683 valid identifier separator chars.
Found 14853 invalid identifier separator chars.
Found 65495 valid file name chars.
Checked 65536 chars in 5 ms or 0.000 ms per char.

I’ve kept both approaches in the above program and this also includes finding valid file name char’s for completeness. Without all this (and in particular all the output related code) you could have simply written the quick program below to trace valid separator chars. 🙄

1
2
3
4
5
6
7
8
9
for (int i = char.MinValue; i <= char.MaxValue; i++)
{
    var c = (char)i;
    var identifier = $"_{c}_";
    if (SyntaxFacts.IsValidIdentifier(identifier))
    {
        Trace.WriteLine(c);
    }
}

Compiled lists of valid and invalid separator chars and file name chars can be found in the table below. Note Jekyll (used for this blog) can’t handle the large markdown files so files have been given the extra .txt extension.

CSV Markdown as txt
validSeparators.csv.txt validSeparators.md.txt
validFileNameChars.csv.txt validFileNameChars.md.txt
invalidSeparators.csv.txt invalidSeparators.md.txt

For reference in the next sections are the first valid separators until decimal 511 and first invalid separators. It is noteworthy - and of course natural - that most “ascii” non-digit/non-alpha characters are invalid as identifier separators. _ being the exception. I selected a few that might be used below, but these also have issues with being easily confused with the “normal” ascii range versions. Of course, this assume you use English as a language, so this is probably abusing other languages letters. They is another issue though…

Decimal Hex Identifier
00095 0x005F ___
00170 0x00AA _ª_
00448 0x01C0 _ǀ_
00449 0x01C1 _ǁ_
00450 0x01C2 _ǂ_
00451 0x01C3 _ǃ_
00713 0x02C9 _ˉ_
00714 0x02CA _ˊ_
00715 0x02CB _ˋ_
00716 0x02CC _ˌ_
00717 0x02CD _ˍ_
00718 0x02CE _ˎ_
00719 0x02CF _ˏ_
00720 0x02D0 _ː_
00721 0x02D1 _ˑ_
00748 0x02EC _ˬ_
00926 0x039E _Ξ_
01994 0x07CA _ߊ_
01997 0x07CD _ߍ_

… and that is some characters display as a square or rectangular box, or as a box with a dot, question mark or “x” inside depending on system or browser used. As can be seen below with screnshots from a desktop browser and on the phone. On the phone not all characters are displayed. I am no expert on the matter but as far as I know this is a font issue. So the end result is it is very hard to find any other good separator char than underscore _ in C# identifiers. However, you could then use multiples of this for separation instead e.g. M_Surface__V_1.2.1.onnx.

Desktop Phone
desktop phone

Some Valid Identifier Separators

Decimal Hex Identifier
00048 0x0030 _0_
00049 0x0031 _1_
00050 0x0032 _2_
00051 0x0033 _3_
00052 0x0034 _4_
00053 0x0035 _5_
00054 0x0036 _6_
00055 0x0037 _7_
00056 0x0038 _8_
00057 0x0039 _9_
00065 0x0041 _A_
00066 0x0042 _B_
00067 0x0043 _C_
00068 0x0044 _D_
00069 0x0045 _E_
00070 0x0046 _F_
00071 0x0047 _G_
00072 0x0048 _H_
00073 0x0049 _I_
00074 0x004A _J_
00075 0x004B _K_
00076 0x004C _L_
00077 0x004D _M_
00078 0x004E _N_
00079 0x004F _O_
00080 0x0050 _P_
00081 0x0051 _Q_
00082 0x0052 _R_
00083 0x0053 _S_
00084 0x0054 _T_
00085 0x0055 _U_
00086 0x0056 _V_
00087 0x0057 _W_
00088 0x0058 _X_
00089 0x0059 _Y_
00090 0x005A _Z_
00095 0x005F ___
00097 0x0061 _a_
00098 0x0062 _b_
00099 0x0063 _c_
00100 0x0064 _d_
00101 0x0065 _e_
00102 0x0066 _f_
00103 0x0067 _g_
00104 0x0068 _h_
00105 0x0069 _i_
00106 0x006A _j_
00107 0x006B _k_
00108 0x006C _l_
00109 0x006D _m_
00110 0x006E _n_
00111 0x006F _o_
00112 0x0070 _p_
00113 0x0071 _q_
00114 0x0072 _r_
00115 0x0073 _s_
00116 0x0074 _t_
00117 0x0075 _u_
00118 0x0076 _v_
00119 0x0077 _w_
00120 0x0078 _x_
00121 0x0079 _y_
00122 0x007A _z_
00170 0x00AA _ª_
00173 0x00AD _­_
00181 0x00B5 _µ_
00186 0x00BA _º_
00192 0x00C0 _À_
00193 0x00C1 _Á_
00194 0x00C2 _Â_
00195 0x00C3 _Ã_
00196 0x00C4 _Ä_
00197 0x00C5 _Å_
00198 0x00C6 _Æ_
00199 0x00C7 _Ç_
00200 0x00C8 _È_
00201 0x00C9 _É_
00202 0x00CA _Ê_
00203 0x00CB _Ë_
00204 0x00CC _Ì_
00205 0x00CD _Í_
00206 0x00CE _Î_
00207 0x00CF _Ï_
00208 0x00D0 _Ð_
00209 0x00D1 _Ñ_
00210 0x00D2 _Ò_
00211 0x00D3 _Ó_
00212 0x00D4 _Ô_
00213 0x00D5 _Õ_
00214 0x00D6 _Ö_
00216 0x00D8 _Ø_
00217 0x00D9 _Ù_
00218 0x00DA _Ú_
00219 0x00DB _Û_
00220 0x00DC _Ü_
00221 0x00DD _Ý_
00222 0x00DE _Þ_
00223 0x00DF _ß_
00224 0x00E0 _à_
00225 0x00E1 _á_
00226 0x00E2 _â_
00227 0x00E3 _ã_
00228 0x00E4 _ä_
00229 0x00E5 _å_
00230 0x00E6 _æ_
00231 0x00E7 _ç_
00232 0x00E8 _è_
00233 0x00E9 _é_
00234 0x00EA _ê_
00235 0x00EB _ë_
00236 0x00EC _ì_
00237 0x00ED _í_
00238 0x00EE _î_
00239 0x00EF _ï_
00240 0x00F0 _ð_
00241 0x00F1 _ñ_
00242 0x00F2 _ò_
00243 0x00F3 _ó_
00244 0x00F4 _ô_
00245 0x00F5 _õ_
00246 0x00F6 _ö_
00248 0x00F8 _ø_
00249 0x00F9 _ù_
00250 0x00FA _ú_
00251 0x00FB _û_
00252 0x00FC _ü_
00253 0x00FD _ý_
00254 0x00FE _þ_
00255 0x00FF _ÿ_
00256 0x0100 _Ā_
00257 0x0101 _ā_
00258 0x0102 _Ă_
00259 0x0103 _ă_
00260 0x0104 _Ą_
00261 0x0105 _ą_
00262 0x0106 _Ć_
00263 0x0107 _ć_
00264 0x0108 _Ĉ_
00265 0x0109 _ĉ_
00266 0x010A _Ċ_
00267 0x010B _ċ_
00268 0x010C _Č_
00269 0x010D _č_
00270 0x010E _Ď_
00271 0x010F _ď_
00272 0x0110 _Đ_
00273 0x0111 _đ_
00274 0x0112 _Ē_
00275 0x0113 _ē_
00276 0x0114 _Ĕ_
00277 0x0115 _ĕ_
00278 0x0116 _Ė_
00279 0x0117 _ė_
00280 0x0118 _Ę_
00281 0x0119 _ę_
00282 0x011A _Ě_
00283 0x011B _ě_
00284 0x011C _Ĝ_
00285 0x011D _ĝ_
00286 0x011E _Ğ_
00287 0x011F _ğ_
00288 0x0120 _Ġ_
00289 0x0121 _ġ_
00290 0x0122 _Ģ_
00291 0x0123 _ģ_
00292 0x0124 _Ĥ_
00293 0x0125 _ĥ_
00294 0x0126 _Ħ_
00295 0x0127 _ħ_
00296 0x0128 _Ĩ_
00297 0x0129 _ĩ_
00298 0x012A _Ī_
00299 0x012B _ī_
00300 0x012C _Ĭ_
00301 0x012D _ĭ_
00302 0x012E _Į_
00303 0x012F _į_
00304 0x0130 _İ_
00305 0x0131 _ı_
00306 0x0132 _IJ_
00307 0x0133 _ij_
00308 0x0134 _Ĵ_
00309 0x0135 _ĵ_
00310 0x0136 _Ķ_
00311 0x0137 _ķ_
00312 0x0138 _ĸ_
00313 0x0139 _Ĺ_
00314 0x013A _ĺ_
00315 0x013B _Ļ_
00316 0x013C _ļ_
00317 0x013D _Ľ_
00318 0x013E _ľ_
00319 0x013F _Ŀ_
00320 0x0140 _ŀ_
00321 0x0141 _Ł_
00322 0x0142 _ł_
00323 0x0143 _Ń_
00324 0x0144 _ń_
00325 0x0145 _Ņ_
00326 0x0146 _ņ_
00327 0x0147 _Ň_
00328 0x0148 _ň_
00329 0x0149 _ʼn_
00330 0x014A _Ŋ_
00331 0x014B _ŋ_
00332 0x014C _Ō_
00333 0x014D _ō_
00334 0x014E _Ŏ_
00335 0x014F _ŏ_
00336 0x0150 _Ő_
00337 0x0151 _ő_
00338 0x0152 _Œ_
00339 0x0153 _œ_
00340 0x0154 _Ŕ_
00341 0x0155 _ŕ_
00342 0x0156 _Ŗ_
00343 0x0157 _ŗ_
00344 0x0158 _Ř_
00345 0x0159 _ř_
00346 0x015A _Ś_
00347 0x015B _ś_
00348 0x015C _Ŝ_
00349 0x015D _ŝ_
00350 0x015E _Ş_
00351 0x015F _ş_
00352 0x0160 _Š_
00353 0x0161 _š_
00354 0x0162 _Ţ_
00355 0x0163 _ţ_
00356 0x0164 _Ť_
00357 0x0165 _ť_
00358 0x0166 _Ŧ_
00359 0x0167 _ŧ_
00360 0x0168 _Ũ_
00361 0x0169 _ũ_
00362 0x016A _Ū_
00363 0x016B _ū_
00364 0x016C _Ŭ_
00365 0x016D _ŭ_
00366 0x016E _Ů_
00367 0x016F _ů_
00368 0x0170 _Ű_
00369 0x0171 _ű_
00370 0x0172 _Ų_
00371 0x0173 _ų_
00372 0x0174 _Ŵ_
00373 0x0175 _ŵ_
00374 0x0176 _Ŷ_
00375 0x0177 _ŷ_
00376 0x0178 _Ÿ_
00377 0x0179 _Ź_
00378 0x017A _ź_
00379 0x017B _Ż_
00380 0x017C _ż_
00381 0x017D _Ž_
00382 0x017E _ž_
00383 0x017F _ſ_
00384 0x0180 _ƀ_
00385 0x0181 _Ɓ_
00386 0x0182 _Ƃ_
00387 0x0183 _ƃ_
00388 0x0184 _Ƅ_
00389 0x0185 _ƅ_
00390 0x0186 _Ɔ_
00391 0x0187 _Ƈ_
00392 0x0188 _ƈ_
00393 0x0189 _Ɖ_
00394 0x018A _Ɗ_
00395 0x018B _Ƌ_
00396 0x018C _ƌ_
00397 0x018D _ƍ_
00398 0x018E _Ǝ_
00399 0x018F _Ə_
00400 0x0190 _Ɛ_
00401 0x0191 _Ƒ_
00402 0x0192 _ƒ_
00403 0x0193 _Ɠ_
00404 0x0194 _Ɣ_
00405 0x0195 _ƕ_
00406 0x0196 _Ɩ_
00407 0x0197 _Ɨ_
00408 0x0198 _Ƙ_
00409 0x0199 _ƙ_
00410 0x019A _ƚ_
00411 0x019B _ƛ_
00412 0x019C _Ɯ_
00413 0x019D _Ɲ_
00414 0x019E _ƞ_
00415 0x019F _Ɵ_
00416 0x01A0 _Ơ_
00417 0x01A1 _ơ_
00418 0x01A2 _Ƣ_
00419 0x01A3 _ƣ_
00420 0x01A4 _Ƥ_
00421 0x01A5 _ƥ_
00422 0x01A6 _Ʀ_
00423 0x01A7 _Ƨ_
00424 0x01A8 _ƨ_
00425 0x01A9 _Ʃ_
00426 0x01AA _ƪ_
00427 0x01AB _ƫ_
00428 0x01AC _Ƭ_
00429 0x01AD _ƭ_
00430 0x01AE _Ʈ_
00431 0x01AF _Ư_
00432 0x01B0 _ư_
00433 0x01B1 _Ʊ_
00434 0x01B2 _Ʋ_
00435 0x01B3 _Ƴ_
00436 0x01B4 _ƴ_
00437 0x01B5 _Ƶ_
00438 0x01B6 _ƶ_
00439 0x01B7 _Ʒ_
00440 0x01B8 _Ƹ_
00441 0x01B9 _ƹ_
00442 0x01BA _ƺ_
00443 0x01BB _ƻ_
00444 0x01BC _Ƽ_
00445 0x01BD _ƽ_
00446 0x01BE _ƾ_
00447 0x01BF _ƿ_
00448 0x01C0 _ǀ_
00449 0x01C1 _ǁ_
00450 0x01C2 _ǂ_
00451 0x01C3 _ǃ_
00452 0x01C4 _DŽ_
00453 0x01C5 _Dž_
00454 0x01C6 _dž_
00455 0x01C7 _LJ_
00456 0x01C8 _Lj_
00457 0x01C9 _lj_
00458 0x01CA _NJ_
00459 0x01CB _Nj_
00460 0x01CC _nj_
00461 0x01CD _Ǎ_
00462 0x01CE _ǎ_
00463 0x01CF _Ǐ_
00464 0x01D0 _ǐ_
00465 0x01D1 _Ǒ_
00466 0x01D2 _ǒ_
00467 0x01D3 _Ǔ_
00468 0x01D4 _ǔ_
00469 0x01D5 _Ǖ_
00470 0x01D6 _ǖ_
00471 0x01D7 _Ǘ_
00472 0x01D8 _ǘ_
00473 0x01D9 _Ǚ_
00474 0x01DA _ǚ_
00475 0x01DB _Ǜ_
00476 0x01DC _ǜ_
00477 0x01DD _ǝ_
00478 0x01DE _Ǟ_
00479 0x01DF _ǟ_
00480 0x01E0 _Ǡ_
00481 0x01E1 _ǡ_
00482 0x01E2 _Ǣ_
00483 0x01E3 _ǣ_
00484 0x01E4 _Ǥ_
00485 0x01E5 _ǥ_
00486 0x01E6 _Ǧ_
00487 0x01E7 _ǧ_
00488 0x01E8 _Ǩ_
00489 0x01E9 _ǩ_
00490 0x01EA _Ǫ_
00491 0x01EB _ǫ_
00492 0x01EC _Ǭ_
00493 0x01ED _ǭ_
00494 0x01EE _Ǯ_
00495 0x01EF _ǯ_
00496 0x01F0 _ǰ_
00497 0x01F1 _DZ_
00498 0x01F2 _Dz_
00499 0x01F3 _dz_
00500 0x01F4 _Ǵ_
00501 0x01F5 _ǵ_
00502 0x01F6 _Ƕ_
00503 0x01F7 _Ƿ_
00504 0x01F8 _Ǹ_
00505 0x01F9 _ǹ_
00506 0x01FA _Ǻ_
00507 0x01FB _ǻ_
00508 0x01FC _Ǽ_
00509 0x01FD _ǽ_
00510 0x01FE _Ǿ_
00511 0x01FF _ǿ_

Some Invalid Identifier Separators

Decimal Hex Identifier
00032 0x0020 _ _
00033 0x0021 _!_
00034 0x0022 _"_
00035 0x0023 _#_
00036 0x0024 _$_
00037 0x0025 _%_
00038 0x0026 _&_
00039 0x0027 _'_
00040 0x0028 _(_
00041 0x0029 _)_
00042 0x002A _*_
00043 0x002B _+_
00044 0x002C _,_
00045 0x002D _-_
00046 0x002E _._
00047 0x002F _/_
00058 0x003A _:_
00059 0x003B _;_
00060 0x003C _<_
00061 0x003D _=_
00062 0x003E _>_
00063 0x003F _?_
00064 0x0040 _@_
00091 0x005B _[_
00092 0x005C _\_
00093 0x005D _]_
00094 0x005E _^_
00096 0x0060 backtick
00123 0x007B _{_
00124 0x007C _|_
00125 0x007D _}_
00126 0x007E _~_
00127 0x007F __
  • What characters are allowed in C# class name? this is what initially led me to go down the wrong rabbit hole, since among others the top answer is given below and I definitely didn’t want to go through those. 😅

    Essentially, any unicode character (including unicode escapes) in the character classes Lu, Ll, Lt, Lm, Lo, Nl, Mn, Mc, Nd, Pc, and Cf. The first character is an exception and it must be a letter (classes Lu, Ll, Lt, Lm, or Lo) or an underscore. Also, if the identifier is a keyword, you must stick an @ in front of it. The @ is optional otherwise.

  • Is there a method in C# to check if a string is a valid identifier [duplicate] didn’t find this at first and although marked as duplicate the second answer is exactly what I was looking for:
    1
    2
    3
    
    var isValid = Microsoft.CodeAnalysis.CSharp.
        SyntaxFacts.IsValidIdentifier("I'mNotValid");
    Console.WriteLine(isValid);     // False
    
  • How to determine if a string is a valid variable name? this is the question/answer the above was considered a duplicate for, but it does not mention SyntaxFacts.
2022.10.03