Design a data structure for Hierarchical Tree Form data and its related operations in .NET C#
Sometimes you find yourself in a need to deal with Hierarchical Tree Form data. In simple words, this is data presented into parent-child nodes.
In such situations, you might sometimes struggle with the complexity of the implementation especially when dealing with a huge amount of data.
In this article, I would provide you with one of the possible designs for such cases. However, you need to keep in mind that, as usual, having a generic design could provide abstracted capabilities but it might lack specific enhancements for specific scenarios.
Therefore, I recommend that you deal with the design you would find in this article as a mind opener. You would need to study it, learn from it, get some ideas from it, and finally adapt it to your own needs.
Main Goals
These are the main goals of this design:
Represent Hierarchical data into a tree data structure.
Have the capability of creating nodes in isolation.
Have the capability of attaching and detaching nodes.
Have the capability of searching nodes with an appropriate performance.
Don’t lose the edge of strongly typed data.
Therefore, while moving across our design, we need to keep in mind these requirements.
The Design
First, we need to keep in mind that the data itself could be any business entity. However, the Hierarchical structure is something else. This is the structure controlling the relation between nodes, how to add, how to remove, how to search,… and so on.
Payload
What we can notice here:
This is the main entity representing the business entity to be wrapped into our Hierarchical structure.
We didn’t add here any members but for sure you can add some if you think that this is common between all your business entities.
NodeType
What we can notice here:
This is an enum representing the type of the node.
We have two node types only; Structural and Leaf.
Structural node is the node which only provides info about the Hierarchical structure. In simple words, it is like a folder or a node which would have child node(s).
Leaf node is the node which would only provide the business data. In simple words, it is the node that wraps the data and would not have any children.
INode
What we can notice here:
This is the interface representing the common members of any node whether it is Structural or Leaf.
Id is the unique identifier of the node. This should be absolute unique. I implemented it as a string but for sure you are free to change it as per your needs.
Name is the name of the node. This should at some point be used for the presentation layer.
PathPart is the name to be used to represent the path of the node.
Parent is the parent node. Its type is IStructuralNode as it could not be a Leaf because a Leaf would never have a child.
IStructuralNode
What we can notice here:
We defined the delegate ChildNodeAddedEventHandler to represent a handler of an event to be fired when a node is added as a child under another node.
We also defined the delegate ChildNodeRemovedEventHandler to represent a handler of an event to be fired when a node is removed from the children of another node.
We defined IStructuralNode interface which extends INode.
It has the two events; ChildNodeAdded and ChildNodeRemoved.
It has a read-only collection of children nodes.
Its node type by default would be Structural.
It has a method to add child.
It has a method to remove child.
It also has a method to return a flat collection of all nested child nodes.
ILeafNode and ILeafNode<TPayload>
What we can notice here:
The ILeafNode is extending the INode.
It has the Payload property which returns the wrapped data of type Payload.
It hides the parent Type and defaults it to Leaf.
The ILeafNode<out TPayload> hides the parent Payload and defines a typed one.
Node
What we can notice here:
It implements INode.
The implementation is simple.
If you like, you can extend this to implement IEquatable<Node>.
We also allow the user to set the Parent directly through the Parent property. However, the implementation is delegated to the AddChild and RemoveChild methods.
LeafNode<TPayload>
What we can notice here:
It inherits from Node and implements ILeafNode<TPayload>.
Worth to mention here is that in the constructor implementation, if a parent is set, we call the AddChild on the parent passing in this.
StructuralNode
Before analyzing this code, we need to understand how it is going to work.
As we said, nodes should be able to be created in isolation. This means that the end user should be able to create a node without setting any parent or even children.
Then he should be able to attach this node as a child to another node. Also, he should be able to attach other nodes as children to the children of this node,… and so on.
For all these actions, our code should keep the structure and relations between nodes intact.
Also, we want to make it easy to search for nodes. A special case would be searching by Id. This should be as fast as we can.
Therefore, keeping this in mind, let’s analyze the code.
What we can notice here:
In the constructor, if a collection of children is set, we call the AddChild method passing in each child of the children.
Also, if a parent is set, we call the AddChild on the parent passing in this.
To make it easy to search for a node, we defined Dictionary<string, INode> m_Catalog. This is a catalog where we keep references to all the nodes -even the nested ones- below the current node.
To keep this catalog always synchronized, the current node subscribes to the ChildNodeAdded and ChildNodeRemoved events of every direct child nodes at the moment they are added.
Following the same rule, the current node unsubscribes from the ChildNodeAdded and ChildNodeRemoved events of every direct child nodes at the moment they are removed.
Also, we need to keep in mind that whenever a node is added to the current node as a direct child, this node might have other nested subtree. In this case, the current catalog must be updated with all the nested children of the subtree as well.
So, having this said, let’s have a look on the implementation of AddChild(INode child).
child.Parent?.RemoveChild(child); makes sure that the new child is detached from its old parent before attaching it to the new parent which is the current node.
m_Catalog.Add(child.Id, child); adds the new child to the catalog.
m_Children.Add(child); adds the child to the Children collection.
OnDirectChildNodeAdded(child); fires the ChildNodeAdded event to notify the direct parent that a new child is added and eventually new updates should be applied to its catalog.
Then we check if the new child is a Structural node because in this case we need to listen to the changes that could be applied to it.
So, if this is true, we loop on the new child’s flat children, add them one by one to the current catalog and also fire the ChildNodeAdded event but this time with the proper values of node and added arguments. This is done to notify the direct parent with the nested updates as well. Note that the direct parent, would also do the same and notify its direct parent,… and so on.
As we said, the current node should subscribe to the ChildNodeAdded and ChildNodeRemoved events of the new child so that it is always updated. This is what we do on structuralNode.ChildNodeAdded += AddHandler; and structuralNode.ChildNodeRemoved += RemoveHandler;.
child.Parent = this; sets the parent of the child node to the current node.
Now, let’s have a look on the implementation of RemoveChild(INode child).
m_Catalog.Remove(child.Id); removes the child from the catalog.
m_Children.Remove(child); removes the child from the Children collection.
OnDirectChildNodeRemoved(child); fires the ChildNodeRemoved event to notify the direct parent that a child is removed and eventually new updates should be applied to its catalog.
Then we check if the new child is a Structural node because in this case we need to listen to the changes that could be applied to it.
So, if this is true, we loop on the child’s flat children, remove them one by one from the current catalog and also fire the ChildNodeRemoved event but this time with the proper values of node and added arguments. This is done to notify the direct parent with the nested updates as well. Note that the direct parent, would also do the same and notify its direct parent,… and so on.
As we said, the current node should unsubscribe from the ChildNodeAdded and ChildNodeRemoved events of the child to be removed. This is what we do on structuralNode.ChildNodeAdded -= AddHandler; and structuralNode.ChildNodeRemoved -= RemoveHandler;.
child.Parent = null; resets the parent to null.
I know that this part was tricky, but, if you just go through it again, you will find it not that complicated.
TreeNavigator
This is like a Helper class which provides extra capabilities like searching the nodes, evaluating full paths, finding node depth,… and so on.
Internally, it is making a good use of the well defined and established structure we defined on our nodes classes.
I didn’t abstract this class but for sure you can do so and you can extend it with as many capabilities as you need.
Final Words
I was going to write a sample application to show you how efficient our design is, but, after a second thought, I decided to leave this to you as an exercise.
So, I know that when you look at this design you might feel that it is too hard to understand. However, you need to keep in mind how complex the job is in the first place.
As I said at the beginning, you need to use the design in this article as a mind opener, give it some thought, learn from it, drop what you don’t need, adapt it, enhance it, and finally start using it. This would always be the right way to learn and gain new skills.
That’s it, hope you found reading this article as interesting as I found writing it.