Implementing a lexer
ReSharper requires a custom language to create a lexer that implements (at least) the
IBuffer is given to the lexer in the constructor, via
Clients of the lexer will follow these steps:
Startto get the lexer to recognise the first token.
Retrieve the current token type from the
TokenTypeproperty, which will be a singleton instance of a language-specific class that derives from
TokenNodeType(see the guide on Token Node Types for more details).
TokenEndproperties to retrieve the offset of the token start and end in the text buffer. This is required because the token type is a singleton instance, and therefore cannot contain details about the location and length of the token itself. The start offset is inclusive, and the end offset is exclusive, just like
TextRange(e.g. given "Hello world", the text range (0, 5) returns "Hello").
Advancemethod repeatedly, to move to the next token, which will update the
TokenEndproperties with the information about the current token and location.
CurrentPosition property is a lexer specific object that encapsulates the information required by the lexer to save and restore the current location. The
LexerStateCookie class can be used by parsers to make it easy to rollback to a specific state in the lexer. It implements the
IDisposable interface, so it can be used in
This can be used to implement lookahead, retrieving a number of tokens ahead, then rolling back to the current position (see Lexer Utility Methods for more details).
Strongly typed lexers
ILexer class exposes the
CurrentPosition as an object, to allow lexers maximum flexibility for storing state about the current position - the lexer can return any object it wishes. However, if the lexer wishes to return a value type, this can add boxing allocations, so the lexer can also implement
This overrides (shadows) the
CurrentPosition property to be of type
TState instead of
object. This will allow a value type to be returned without boxing allocations. For example, if the lexer only requires an integer position (such as a caching lexer), it can implement
ILexer<int>, and avoid boxing the
Similarly, the lexer can implement its state object as a
struct, and return it as a strongly typed item:
struct is copied by value to the caller of
CurrentPosition, and no boxing allocations take place.
ReSharper includes infrastructure for incremental lexing, that is, only lexing the parts of a file that change, and reusing existing tokens for the rest of the file. Most of the work is handled by a caching lexer, and is covered in more detail in the section on incremental parsing.
The custom language parser can implement the
These interfaces expose the lexer state as a
uint value. If the lexer is built with CsLex, this state can be the
yy_lexical_state value, which is used to decide when specific regular expression rules are applied. Alternatively, it can be used as a lookup into other (static) values, or used to encode more state information into the bits of the
uint (the C# lexer uses this strategy to encode a stack of items).
IIncrementalLexer interface has a
Start method, which allows the lexer to start from an arbitrary point in the text buffer, without having to parse the preceding part of the file first. It takes a start and end offset, and also the
uint state value returned from
ILexerEx.LexerStateEx. These values will have been cached from a previous scan of the text buffer. The
CachingLexer classes implement this.
More details are in the section on incremental parsing.