Labels

slider

Recent

Navigation

Learn HAP: How to use XPath using HTML Agility Pack?

How to extract text from HTML tags using XPath HTMLAgility method?

Introduction

XPath refers to XML Path Language which can be put into action to navigate through specific attributes and elements in an HTML or XML document. XPath is an XSLT standard element that is recommended by W3C and it uses "path like" syntax to recognize and navigate single document nodes in an XML document. Before go through this article, you may check previous article on what is HTML agility pack to rewind the few things about.

How to use XPath using HTML Agility Pack

Also, XPath is a path expression that contains more than 220 built-in functions. By using path expressions, it selects single nodes or node-sets in an XML document to extract text from the linked Html path using XpathByHtmlAgility() method.

Free Video Tutorial to Learn XPath using HTML Agility Pack

Components in XPath

The functions for numeric values, sequence handling, string values, booleans, node handling, date and time comparison, and much more are available. Now these days, XPath expressions can also be
integrated in JavaScript, XML Schema, Java, PHP, Python, C and C++, and lots of other languages.

XPath 1.0, XPath 2.0 and XPath 3.0 were the W3C Recommendations..

Using XPath with the HtmlDocument class

Here we are using for web scraping websites and extract information as per our requirements.

XPath Demo to ‘Extract text using XPath’

XPath using HTML Agility method

Step #1: Define object of HTMLWeb

HtmlWeb web = new HtmlWeb();

Step #2: Define object of HtmlDocument()

HtmlDocument doc = new HtmlDocument();

Step #3: Load Document to execute XPath  statement

doc = web.Load("https://www.technologycrowds.com/2019/10/compute-sha-256-hash-using-csharp-for-effective-secruity.html");

Step #4: Now extracting text using XPath Statement

var _extractText = doc.DocumentNode.SelectSingleNode("/html/body/div[5]/div/div/div/div[1]/div/div/div[2]/div[1]/div[2]/article/div[2]/div/div[2]").InnerText;

Step #5: Final Method demonstrating XPath 

// XPath Method
static void xPathByHTMLAgility()
{
 HtmlWeb web = new HtmlWeb();
 HtmlDocument doc = new HtmlDocument();

 doc = web.Load("https://www.technologycrowds.com/2019/10/compute-sha-256-hash-using-csharp-for-effective-secruity.html");
 var _extractText = doc.DocumentNode.SelectSingleNode("/html/body/div[5]/div/div/div/div[1]/div/div/div[2]/div[1]/div[2]/article/div[2]/div/div[2]").InnerText;
 Console.WriteLine(_extractText);
}

Step #6: Final Output using xPath

Conclusion

XPath is very important feature while we working on web scraping, data mining or extract text from website specific web page using Html Agility pack (free video libary) method. For more information, get in touch with us and navigate to our free video libary.

Relevant Reading

Share

Anjan kant

Outstanding journey in Microsoft Technologies (ASP.Net, C#, SQL Programming, WPF, Silverlight, WCF etc.), client side technologies AngularJS, KnockoutJS, Javascript, Ajax Calls, Json and Hybrid apps etc. I love to devote free time in writing, blogging, social networking and adventurous life

Post A Comment:

0 comments: