Labels

slider

Recent

Navigation

How to Find Text by class name using Html Agility Pack C#

How to Find Text by class name using Html Agility Pack C#, Data Extraction using HTML Agility Pack, Learn HTML Agility Pack C#, HTML Agility Pack C# Tutorial

Introduction

HtmlAgility is a very great tool as we have seen how it can be used to traverse the entire HTML content of webpages in C# while having a session on HTML Traversing using Agility Pack C# and through the session present in HTML Traversing using Agility Pack C#, it can also be understood that the HTML content can be manipulated with much ease. In this tutorial, there is yet another important technique that one needs to know if they are in quest of gaining full control on HTML content. Click here Learn to Install HTML agility pack and Load an HTML Document to know how to install HTMLAgility Pack if you are absolutely new to this topic.

Free Video Library: Learn HTML Agility Pack Step by Step

How to Find Text by class name using Html Agility Pack C#

No native methods! Get access indirectly!

Classes in HTML & CSS are meant to be used as common style declaration block that will be applied only on those selective elements which contain that particular attribute inclusion. Therefore, at times, one needs to get the access control of those DOM elements which have a particular CSS class defined on them. In general, there is no unique method defined by the native HTMLAgility pack. It takes a whole lot of patience to get the syntax right if using the existing methodologies are your choice.

Multiple ways but only one simple and efficient solution

One way, in which the job of obtaining the elements defined by same CSS class can be done is to apply regex and extract the relevant class elements. The drawback to this method however is that the regex works very slow if compiled version is not used. Apart from that, the regular expressions are quite tricky and may cause the application to look bulky if a proper combination is not put into use. Thus, putting all the efforts into getting the desired result effectively would be as the one that follows here:

Step #1

Load the HTML document into a local HtmlAgility variable doc using the methods discussed here.

Step #2

Assign the URL of the webpage that needs to be traversed into this doc variable.

Step #3

To get all the elements that have the CSS class applied, say float as an example here, apply documentNode and on top of that use empty argument list Descendants method.

Step #4

The next job is to find the elements that have the attribute called class with value which is float here.
Thus, what exactly is being done here is that to extract all the elements present in the webpage and filter out only those elements which have attribute as class and present with value as Jump to navigation, Jump to search.

Output

Jump to navigation
Jump to search

using System;
using System;
using HtmlAgilityPack;
using System.Collections.Generic;
using System.Linq;

public class Program
{
 public static void Main()
 {
  // declaring & loading dom
  HtmlWeb web = new HtmlWeb();
  HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
  doc = web.Load("https://en.wikipedia.org/wiki/Main_Page");
  
  // filter html elements on the basis of class name
  IEnumerable nodes = doc.DocumentNode.Descendants().Where(n => n.HasClass("mw-jump-link"));
  
  foreach(var item in nodes)
  {
   // displaying final output
   Console.WriteLine(item.InnerText); 
  }
 }
}

Working Sample:

Relevant Reading

Share

Anjan kant

Outstanding journey in Microsoft Technologies (ASP.Net, C#, SQL Programming, WPF, Silverlight, WCF etc.), client side technologies AngularJS, KnockoutJS, Javascript, Ajax Calls, Json and Hybrid apps etc. I love to devote free time in writing, blogging, social networking and adventurous life

Post A Comment:

0 comments: