Overview

Tree structure

Thanks to the simple and regular structure of XML, the XML PSI tree is fairly small, and pretty easy to understand. However, the PSI tree should not be confused with a DOM. The PSI tree is an abstract syntax tree, and models all of the syntax in an XML file, while an XML DOM is a (light) abstraction on top of the file. A DOM lists elements and attributes, while an abstract syntax tree will also list whitespace, and the constructs used to create an element (e.g. the opening and closing tags).

A good example here is that the text contents of an XML element is split into text tokens and whitespace nodes. Even if the owning XML element has the xml:space="preserve" attribute set, the tree doesn't change, and the whitespace nodes are still added. In other words, the semantic nature of the xml:space attribute isn't reflected in the abstract syntax tree.

Since XML is a hierarchical structure, care should also be taken not to confuse child nodes in the tree with children in the XML DOM. For example, the children of an XML element in the PSI tree include the nodes used to construct the opening and closing tags, as well as the inner nodes of the elements (whitespace, child tags, etc).

See the Tree Nodes reference section for more details of the XML tree nodes.

Tree nodes

Each node in the XML tree derives from IXmlTreeNode:

public interface IXmlTreeNode :
  ITreeNode
{
  XmlTokenTypes XmlTokenTypes { get; }

  TReturn AcceptVisitor<TContext, TReturn>(IXmlTreeVisitor<TContext, TReturn> visitor, TContext context);
}

The XmlTokenTypes property provides access to the token node type instances assigned to the token nodes when building the tree. These instances are the values returned from ITreeNode.NodeType, and ITokenNode.GetTokenType(). It can be used to compare against the token node type reported by a token node in order to find a particular token type (e.g. look for tokens of type XmlTokenTypes.COMMENT_START). It is also used by the XML file parser to create token nodes via the token node type instances' Create method. This might return a derived instance for some XML languages, such as XAML.

The AcceptVisitor method implements the Visitor pattern. It will call the strongly typed Visit method of the IXmlTreeVisitor instance passed in.

Tokens

All tokens in the tree derive from IXmlToken, which in turn derives from IXmlTreeNode. Remember that a token is a leaf element in the tree - it doesn't have any children, and is used to build the higher level tree nodes (e.g. an xml tag contains the tokens for <, the tag name identifier, > and so on).

public interface IXmlToken :
  IXmlTreeNode,
  ITokenNode,
  ITreeNode
{
  new XmlTokenNodeType GetTokenType();
}

See also: IXmlTreeNode

The ITokenNode.GetTokenType method is hidden by a new method of the same name, but that returns an instance of XmlTokenNodeType, rather than TokenNodeType. XmlTokenNodeType derives from the TokenNodeType abstract class and adds a single property, that returns the XmlTokenTypes class that lists all the known token node types. This class can be a derived instance for some XML languages, such as XAML.

public XmlTokenTypes XmlTokenTypes { get; }

File node

The root node of an XML file is IXmlFile:

public interface IXmlFile :
  IFile,
  IXmlTagContainer,
  IXmlDocumentNode,
  IXmlTreeNode,
  ITreeNode
{
  TreeNodeCollection<IProcessingInstruction> ProcessingInstructions { get; }
  XmlElementTypes XmlElementTypes { get; }
}

See also: IProcessingInstruction, IXmlDocumentNode, IXmlTagContainer, IXmlTreeNode

The XmlElementTypes property provides access to the composite node type instances assigned to the composite, non-leaf nodes when building the tree. These instances are the values returned from ITreeNode.NodeType. It can be used to compare against the node type of any node, to look for nodes of a certain type, and is also used by the XML file parser to create element nodes via the element node type instances' Create method. The type of XmlElementTypes might be a derived instance, depending on the XML language.

The IXmlFile directly exposes a collection of processing instructions. This only returns processing instructions that implement IProcessingInstruction, and despite its name, the IXmlProcessingInstruction that represents the XML declaration doesn't get returned in this collection. To get at the XML declaration, you need to get it directly from the child nodes:

var xmlDeclarations = file.Children<IXmlProcessingInstruction>();

The inherited IXmlTagContainer interface provides child tag access and manipulation functions.

It also implements the IXmlDocumentNode marker interface.

XML tags

An XML tag is represented with the IXmlTag interface. This is where it is important not to confuse child tags with child nodes. The IXmlTag interface represents the span from the opening < character to the closing > character of an XML element. Its child nodes in the tree split this span further, into a header, an optional footer and the content nodes.

The header is represented with the IXmlTagHeader interface, and covers the span from the opening < character to the closing > of the opening part of the XML element. E.g. <foo baz="quux"> only, and not including the closing element </foo>. This means that the header span also includes attributes, and the tree nodes representing the attributes are indeed child nodes of the tag header.

The footer is the closing element (</foo>), which may not be included in the tree if the XML element is self-closed (<foo />).

The inner child nodes are either text (IXmlFloatingTextTokenNode, which also includes whitespace), or other nodes such as IXmlCData, IXmlEntityTokenNode or IXmlTag.

public interface IXmlTag :
  IXmlTreeNode,
  IXmlTagContainer,
  ITreeNode
{
  IXmlTagFooter Footer { get; }
  IXmlTagHeader Header { get; }
  string InnerText { get; }
  TreeNodeCollection<IXmlToken> InnerTextTokens { get; }
  string InnerValue { get; }
  ITreeRange InnerXml { get; }
  bool IsEmptyTag { get; }

  TXmlAttribute AddAttributeAfter<TXmlAttribute>(TXmlAttribute attribute, IXmlAttribute anchor);
  TXmlAttribute AddAttributeBefore<TXmlAttribute>(TXmlAttribute attribute, IXmlAttribute anchor);

  void RemoveAttribute(IXmlAttribute attribute);
}

See also: IXmlAttribute, IXmlTagContainer, IXmlTagFooter, IXmlTagHeader, IXmlToken, IXmlTreeNode

XML attributes

Attributes are accessible via the IXmlAttributeContainer interface, mostly implemented by IXmlTagHeader, but also by the XML declaration IXmlProcessingInstruction.

public interface IXmlAttributeContainer :
  IXmlTreeNode,
  ITreeNode
{
  TreeNodeCollection<IXmlAttribute> Attributes { get; }
  TreeNodeEnumerable<IXmlAttribute> AttributesEnumerable { get; }
  string ContainerName { get; }
}

See also: IXmlAttribute, IXmlTreeNode

Methods to add and remove attributes are available on IXmlTag.

Processing Instructions

Processing instructions are represented with the IProcessingInstruction interface. This is a simple interface, providing access to the target name, and a string representing the unparsed content.

DTD

Derived XML languages

Manipulating the tree

The XML PSI implementation includes several navigator classes. As usual, the naming follows a convention - the classes are named after the tree node you are trying to navigate to, to find, suffixed with 'Navigator', for example, XmlTagNavigator. And the methods are named after the tree node you are navigating from, e.g. XmlTagNavigator.GetByTagHeader.

Utilities

The XML PSI implementation includes a couple of utility classes, some for manipulation of the tree, and some for helper methods for handling references within XML files to CLR types (e.g. web.config files)

Last modified: 29 March 2023

ReSharper Platform SDK Help