Code inspection: Use UTF-8 string literal

UTF-8 is one of the most commonly used character encodings, particularly on the internet. However, in .NET, the char and string types use UTF-16 to represent their values. This necessitates an additional step to obtain the UTF-8 representation of a string, such as invoking System.Text.Encoding.UTF8.GetBytes(), which makes the conversion at runtime. To avoid this runtime cost, some developers might choose to perform the encoding in advance and then incorporate the output byte array in the source code as follows:

        
// "HTTP/1.1 "
private static ReadOnlySpan<byte> HttpVersion11Bytes =>
  new byte[] { 0x48, 0x54, 0x54, 0x50, 0x2f, 0x31, 0x2e, 0x31, 0x20 };

C# 11 introduces a new, simpler way to represent UTF-8 strings in the source code without any runtime overhead:

        
// Notice the 'u8' suffix after the string literal
private static ReadOnlySpan<byte> HttpVersion11Bytes => "HTTP/1.1 "u8;

This inspection helps recognize existing ways of representing UTF-8 strings and replace them with the new language feature to improve the readability of your code.

It also detects usages of Encoding.Utf8.GetBytes() with string literals and helps transform it to the new UTF-8 string literal. This not only improves the readability but also enhances performance by eliminating the need for runtime encoding.

            
private static ReadOnlySpan<byte> HttpVersion11Bytes =>
  Encoding.UTF8.GetBytes("HTTP/1.1 ");

            
private static ReadOnlySpan<byte> HttpVersion11Bytes =>
  "HTTP/1.1 "u8;

Last modified: 08 May 2024