Navigation

Learn HAP: HTML Traversing using Agility Pack C#

HTML Traversing using html Agility Pack C#, html traverse C#, HTML traversing c#, html agility pack c# example, html agility pack tutorial c#

Introduction

In the previous session on Learn HTML Manipulation using html agility pack we have seen that the HTML content can be changed according to demands using HTML Agility Pack. Now it is time to know about yet another useful technique which is HTML Traversing using html Agility Pack C#.

Free Video Library: Learn HTML Agility Pack Step by Step

Before the advent of HAP working around with HTML was a tedious job as it involves usage of different classes that come inbuilt with C#. Parsing HTML DOM is no more a big deal after you are acquainted with the techniques that are explained here.


At times, there will be necessity to access a particular element and there is only few possibilities to access it from the DOM based on the information about the HTML content present with us. In such cases, learning different methods for html traverse C# can be really helpful. Below are the important ones while doing HTML traversing c#.

HTML Traversing using Agility Pack C#

#1 Child Nodes

Take a close look at the sample code mentioned below.

 var html = @"<body>
	   <h1>Here is showing how to list child nodes</h1>  
	   <h1>Technology Crowds</h1>
	   <h2>www.TechnologyCrowds.com</h2>
	</body>";
var htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(html);

var htmlBody = htmlDoc.DocumentNode.SelectSingleNode("//body");

HtmlNodeCollection childNodes = htmlBody.ChildNodes;
foreach (var node in childNodes)
{
	if (node.NodeType == HtmlNodeType.Element)
	{
		Console.WriteLine(node.OuterHtml);
	}
}
We retrieve the child nodes from the HTML content using the ChildNodes method which is actually a member of HtmlAgilityPack.HtmlNode. This is a public member that returns the data of HTMLNodeCollection type for the given node. Here in this case, we are traversing the body of HTML.

Output


<h1>Here is showing how to list child nodes</h1>  
<h1>Technology Crowds</h1>
<h2>www.TechnologyCrowds.com</h2>

#2 First Child

There will be many circumstances where only the first child is needed to be retrieved from the HTML DOM for a particular node and this method exactly addresses such requirement. Take a close look at the code.

var html = @"<body><h1>Demonstrating how to list first and last child</h1><h2>Technology Crowds</h2><h2>www.TechnologyCrowds.com</h2></body>";
var htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(html);
var htmlBody = htmlDoc.DocumentNode.SelectSingleNode("//body");
HtmlNode firstChild = htmlBody.FirstChild;
Console.WriteLine(firstChild.OuterHtml); 
This is a public member which means you can access it from anywhere in your application. It returns the first child of the given node.

Output

<h1>Demonstrating how to list first and last child</h1>

#3 Last Child

You can straightaway hit the last element of the HTML DOM for a particular node by using the LastChild member from the HtmlAgilityPack.HtmlNode. This public method gives us the privilege to be able to traverse to the last child of the node in C#.  The example code here gives a clear picture about how exactly this method can be used to retrieve the last child of the body node here.
var html = @"<body><h1>Demonstrating how to list first and last child</h1><h2>Technology Crowds</h2><h2>www.TechnologyCrowds.com</h2></body>";
var htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(html);
var htmlBody = htmlDoc.DocumentNode.SelectSingleNode("//body");
HtmlNode firstChild = htmlBody.LastChild;
Console.WriteLine(firstChild.OuterHtml); 

Output

<h2>www.TechnologyCrowds.com</h2>

#4 Next Sibling

Siblings in Html terminology applies to those elements that have same parent node. There are two types in Siblings relationships: Adjacent and General. These selectors are indeed important facilitators to pick the desired elements at any cost. This equivalent accessibility is leveraged in C# from HAP using NextSibling method. It is a public method that returns the node which is immediately following the present element. This method when applied on the variable that holds the HTML node returns the sibling element that is present following next to it under its parent node. Follow the example below to get a view of its usage.

var html = @"<body> 
<h1>
Here demonstrating hot to get next sibling</h1>
<h2>Technology Crowds</h2>
<h3>www.TechnologyCrowds.com</h3>
</body>";

var htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(html);

var node = htmlDoc.DocumentNode.SelectSingleNode("//body/h1");

HtmlNode sibling = node.NextSibling;

while (sibling != null)
{
	if (sibling.NodeType == HtmlNodeType.Element)
	{
		Console.WriteLine(sibling.OuterHtml);
	}

	sibling = sibling.NextSibling;
} 

Output

<h2>Technology Crowds</h2>
<h3>www.TechnologyCrowds.com</h3> 

#5 ParentNode

HTML follows a DOM pattern where there are nodes that have few other nodes under them. Thus, we can classify as Parent Node and child Node. One best example of Parent Node is the body element where the remaining elements inside it become Child Nodes of the ParentNode. In HAP, when we know the child element and we have a situation to grab the Parent node then ParentNode method from the HtmlAgilityPack.HtmlNode is the right tool for us. This public method returns the parent node only if the given node has parent.
The code here explains how to use this functionality. This html agility pack c# example is helpful to give you an insight.

var html = @"<body> 
   <h1>
Here demonstrating hot to get next sibling</h1>
<h2>
Technology Crowds</h2>
<h3>
www.TechnologyCrowds.com</h3>
</body>";

var htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(html);

var node = htmlDoc.DocumentNode.SelectSingleNode("//body/h1");

HtmlNode sibling = node.NextSibling;

while (sibling != null)
{
	if (sibling.NodeType == HtmlNodeType.Element)
	{
		Console.WriteLine(sibling.OuterHtml);
	}

	sibling = sibling.NextSibling;
} 

Output

<h2>Technology Crowds</h2>
<h3>www.TechnologyCrowds.com</h3> 

Relevant Reading

Share

Anjan Kant

Outstanding journey in Microsoft Technologies (ASP.Net, C#, SQL Programming, WPF, Silverlight, WCF etc.), client side technologies AngularJS, KnockoutJS, Javascript, Ajax Calls, Json and Hybrid apps etc. I love to devote free time in writing, blogging, social networking and adventurous life

Post A Comment:

0 comments: