Labels

slider

Recent

Navigation

HAP: How to extract favicon from website using HTML Agility Pack

How to extract favicon from website using HTML Agility Pack, HTML Agility Pack C# Tutorial

Overview

The last article on HtmlAgility Search By Text has detailed the steps involved on one of the important utilities provided by the HtmlAgility pack. Yet there is this another significant ability that has to be learned to get the favicon present in a particular web page. If you are new to HtmlAgility pack, you must consider visiting this What is HTML Agility Pack as there are quite a lot of topics one has to acquaint. You could also have a quick recap on the abilities of HtmlAgility pack regarding HTML Traversing using Agility Pack.

How to extract favicon from website using HTML Agility Pack

What is a favicon?

Cut short, favicon is an icon that acts as a shortcut for a website, URL, tab or bookmarks. A web designer uploads it as a part of the page UI and the browser displays this as an image next to the address bar and adjacent to bookmarks list if at all the page was bookmarked by the web user. If the browser supports multi-tabs, then the image is shown beside the title of the page on the tab.

How to extract favicon?

Typically, the favicon inclusion into a web page is achieved through the HTML Mark up. Hence, to extract favicon from website, one must have prerequisite knowledge of HTML Manipulation by using HtmlAgility pack and you can have a good idea of it by referring here. Follow the below steps to extract favicon.

Step #1

Pass the website URL into a local variable as the first step in the process of extracting the favicon.

Step #2

Further to that, declare a variable of the type HtmlWeb and using this, load the document into another variable.

Step #3

Declare favicon variable and initialize it to null followed by typecasting it to dynamic.

Step #4

Get the node that declares the favicon by using SelectSingleNode as shown below.
var el = htmlDoc.DocumentNode.SelectSingleNode("/html/head/link[@rel='icon' and @href]");

Step #5

Given that the node is not empty or null, get the href attribute value .

Step #6

Display the resulting information on the console or use it for your own purposes.

You may refer the below mentioned code to get a good view of the logic behind extraction of favicon from the web page.

using System;
using System;
using HtmlAgilityPack;
using System.Net;

public class Program
{
 public static void Main()
 {
  // website URL
  var html = @"https://www.TechnologyCrowds.com/";
  ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls12 | SecurityProtocolType.Tls11 | SecurityProtocolType.Tls;
  
  // declare htmlweb and load html document
  HtmlWeb web = new HtmlWeb();
  var htmlDoc = web.Load(html);
  
  var favicon = (dynamic)null;
  // extracting icon
  var el = htmlDoc.DocumentNode.SelectSingleNode("/html/head/link[@rel='icon' and @href]");
  if (el != null)
  {
   favicon = el.Attributes["href"].Value;
   
   // showing output here
   Console.WriteLine(Convert.ToString(favicon));
  }
 }
}

Output

https://www.technologycrowds.com/favicon.ico

Conclusion

Explore on the other features of the HtmlAgility pack from these list of tutorials such as here to stay at the top.

Relevant Reading

Share

Anjan kant

Outstanding journey in Microsoft Technologies (ASP.Net, C#, SQL Programming, WPF, Silverlight, WCF etc.), client side technologies AngularJS, KnockoutJS, Javascript, Ajax Calls, Json and Hybrid apps etc. I love to devote free time in writing, blogging, social networking and adventurous life

Post A Comment:

0 comments: