Labels

slider

Recent

Navigation

HAP: Extract Links From Web Page using HTML Agility Pack

Extract Links From Web Page using HTML Agility Pack C#

Introduction

Hope you must have gone through my latest web scraping post and now we would have a vivid discussion on how to Extract Links From Web Page using HTML Agility Pack. Here, mostly the digital marketing professionals, SEO professionals, Research & analyst and many more could take the advantages to extract more back links or other useful links for some specific purpose. Unlike extraction of text, Data scraping, favicon extraction, capturing image, Data Mining, meta information, and other things, Link extraction is also another valuable activity for the online advertising professionals.
 Extract Links From Web Page using HTML Agility Pack

Step #1

Declare function ExtractLinksFromWebPageusingHTMLAgilityPack (String URL)

Step #2

Then Declare HTML document through HTMLWeb() method.

Step #3

Load HTML document (doc) through HtmlAgilityPack.HtmlDocument() method.

Step #4

Define parameter to doc through web.load(URL) method.



Step #5

Now, we have to extract all links available in the web page through foreach(HtmlNode link in doc.DocumentNode.SelectNodes(“//a[@href]”)) method.

Step #6

Then declare HtmlAttribute through link.Attributes[“href”]method.

Step #7

Don’t forget to mention the URL of the web page before the 1st step to detect the targeted web page, For example, ExtractLinksFromWebPageusingHTMLAgilityPack (“http://www.technologycrowds.com”)
I hope you’ll be definitely able to Extract Links From Web Page using HTML Agility Pack with the help of the above steps.

using System;
using HtmlAgilityPack;

public class Program
{
 public static void Main()
 {
  // calling method
  ExtractHref("https://technologycrowds.com");
 }

 static void ExtractHref(string URL)
 {
  // declaring & loading dom
  HtmlWeb web = new HtmlWeb();
  HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
  doc = web.Load(URL);
  
  // extracting all links
  foreach (HtmlNode link in doc.DocumentNode.SelectNodes("//a[@href]"))
  {
   HtmlAttribute att = link.Attributes["href"];
   
   if (att.Value.Contains("a"))
   {
    // showing output
    Console.WriteLine(att.Value);
   }
  }
 }
}

Working Sample :

Share

Anjan kant

Outstanding journey in Microsoft Technologies (ASP.Net, C#, SQL Programming, WPF, Silverlight, WCF etc.), client side technologies AngularJS, KnockoutJS, Javascript, Ajax Calls, Json and Hybrid apps etc. I love to devote free time in writing, blogging, social networking and adventurous life

Post A Comment:

0 comments: