Scraping or extracting metadata from any webpage is done by extracting the data from the html tags of the web page. It helps users and search engines to know more about the webpage
This post will explain the steps to extract metadata using Opengraph IO and Node.
Please refer the opengraph.io website and choose the best pricing strategy based on the requirements. They also offer a free plan as of writing this tutorial.
Create an account in opengraph and get the app id.
The next few steps will explain how to use the API key in Node to retrieve the metadata
Install the opengraph library
npm install opengraph-io
Next create a function to get the metadata using the opengraph app id and the webpage to extract the data
let og = require('opengraph-io')({
appId: 'abcxyz', //app id from opengraph-io
cacheOk: true, // If a cached result is available, use it for quickness
useProxy: false, // Proxies help avoid being blocked and can bypass capchas
maxCacheAge: 432000000, // The maximum cache age to accept
acceptLang: 'en-US,en;q=0.9', // Language to present to the site.
fullRender: false // This will cause JS to execute when rendering to deal with JS dependant sites
});
function scrapeMetaData(webpage) {
return new Promise((resolve, reject) => {
return og.getSiteInfo(webpage)
.then(function(metadata){
resolve(metadata);
}).catch((err) => {
reject(err);
});
});
}
Hope you enjoyed this tutorial in learning how to extract metadata from webpages.