Labels

slider

Recent

Navigation

How to parse HTML table using HTML Agility Pack C#

How to parse HTML table using HTML Agility Pack using C# for data mining and web scraping

Introduction

Often copying tabular data from web pages are time-consuming and you may not happy with the quality of the ready made tabular structure. For the purpose of time-saving, you must know How to parse HTML table using HTML Agility Pack C# and your needs could be fulfilled in a customized manner. Apart from the extraction of text, capturing image, favicon, meta information, Data Mining, and other things, Parsing HTML table could be the latest Web Scraping tactics to help the end-users.


How to parse HTML table using HTML Agility Pack C#

Step #1

Declare function to parse HTML table using HTML Agility Pack.

Step #2

Now declare object of HTMLDocument() of HTMLAgilityPack .

Step #3

Now load HTML table through doc.LoadHtml() method. For example for 3 row and 1 columns tables, codes will be as follows

doc.LoadHtml(@"<table id=""TC""><tr><th>Name</th></tr><tr><td>Technology</td></tr><tr><td>Crowds</td></tr></table>");

Step #4

Use linq to parse Html table smartly through declaration of variable HTMLTableTRList and passing data through
  • from table in doc.DocumentNode.SelectNodes(“//table”).Cast<HtmlNode>()
  • from row in table.SelectNodes(“tr”).Cast<HtmlNode>()
  • from cell in row.SelectNodes(“th|td”).Cast<HtmlNode>()
  • Select new{Table_Name=Table.id, Cell_Text= Cell.InnerText}

Step #5

Now, we have to display the parsed HTML table through the foreach statement as follows.
foreach(var cell in HTMLTableTRList) 
{
 Console.WriteLine("{0}: {1}", cell.Table_Name, cell.Cell_Text);
}

Step #6

All the table data will be displayed systematically.
Hope the readers must have the most helpful information regarding How to parse HTML table using HTML Agility Pack C#.

using System;
using HtmlAgilityPack;
using System.Linq;
     
public class Program
{
 // description: Showing here how to parse complex web HTML table using HTML Agility Pack C#
 public static void Main()
 {
  //declare object of HtmlDocument
  HtmlDocument doc = new HtmlDocument();
  doc.LoadHtml(@"@"<table id=""TC""><tr><th>Name</th></tr><tr><td>Technology</td></tr><tr><td>Crowds</td></tr></table>");
  
  // Using LINQ to parse HTML table smartly 
  var HTMLTableTRList = from table in doc.DocumentNode.SelectNodes("//table").Cast()
            from row in table.SelectNodes("tr").Cast()
            from cell in row.SelectNodes("th|td").Cast()
            select new {Table_Name = table.Id, Cell_Text = cell.InnerText};

  // now showing output of parsed HTML table
  foreach(var cell in HTMLTableTRList) 
  {
      Console.WriteLine("{0}: {1}", cell.Table_Name, cell.Cell_Text);
  }
 }
}

Share

Anjan kant

Outstanding journey in Microsoft Technologies (ASP.Net, C#, SQL Programming, WPF, Silverlight, WCF etc.), client side technologies AngularJS, KnockoutJS, Javascript, Ajax Calls, Json and Hybrid apps etc. I love to devote free time in writing, blogging, social networking and adventurous life

Post A Comment:

0 comments: