Labels

slider

Recent

Navigation

Learn HAP: Grab all images from Website using HTML Agility Pack C#

Grab all images from Website using HTML Agility Pack C#

Introduction

After extraction of text through XPath method in our previous article, it’s time to grab all images from Website using HTML Agility Pack C#. Not a big deal! Just change some syntax! Also, here XML Path Language can be incorporated into action to navigate through particular elements and attributes in an XML or HTML document.

Free Video Library: Learn HTML Agility Pack Step by Step

XPath is nothing but an XSLT standard element that is often suggested by W3C for the purpose of web scraping. It uses common syntax "path like" to make out and find the way to single document nodes in an XML or HTML document.

Extract content on a web page with XPath

How to use XPath to grab all images from Website using HTML Agility Pack C#?

Also, XPath is a web path expression that possesses more than 200 built-in functions. By using different relevant path expressions, it opts for distinct nodes or node-groups in an HTML or XML document to extract image from the linked Html or XML path through ExtractAllImages() method.


Important Components in XPath

Some important functions of XPath are sequence handling; numeric values, string values, booleans, date and time comparison, and node handling, etc are available. Apart from html and C#, you can also integrate XPath expressions with various programming languages like XML Schema, JavaScript, Java, C, Python, PHP, and C++, and lots of other programming languages.
Starting from XPath version 1.0 to 3.0 is recommended by the W3C.

We use HtmlDocument class in XPath

We use above class for the purpose of web scraping the websites and capture images or pictures as per our requirements.
XPath Demo program to ‘Grab image using XPath’
XPath through HTML Agility method

Step #1: Define object of HTMLWeb as follows

// declare html document
HtmlWeb web = new HtmlWeb();

Step #2: Load Doc to extract images from website URL

// load the document here
var document = new HtmlWeb().Load("https://www.technologycrowds.com/2019/12/net-core-web-api-tutorial.html");

Step #3: Now apply Linq query to images from web URL

// now using LINQ to grab/list all images from website
var ImageURLs = document.DocumentNode.Descendants("img")
   .Select(e => e.GetAttributeValue("src", null))
   .Where(s => !String.IsNullOrEmpty(s));

Step #4: Get Final Output of extracted images

// now showing all images from web page one by one
foreach(var item in ImageURLs)
{
 if (item != null)
 {
  Console.WriteLine(item);
 }
}

Step #5: Complete Method showing here:

Relevant Reading

Share

Anjan kant

Outstanding journey in Microsoft Technologies (ASP.Net, C#, SQL Programming, WPF, Silverlight, WCF etc.), client side technologies AngularJS, KnockoutJS, Javascript, Ajax Calls, Json and Hybrid apps etc. I love to devote free time in writing, blogging, social networking and adventurous life

Post A Comment:

0 comments: