Opengraph tags – Scraping metadata from sites – Node and OpenGraph IO

Scraping or extracting metadata from any webpage is done by extracting the data from the html tags of the web page. It helps users and search engines to know more about the webpage

This post will explain the steps to extract metadata using Opengraph IO and Node.

Please refer the opengraph.io website and choose the best pricing strategy based on the requirements. They also offer a free plan as of writing this tutorial.

Create an account in opengraph and get the app id.

The next few steps will explain how to use the API key in Node to retrieve the metadata

Install the opengraph library

npm install opengraph-io

Next create a function to get the metadata using the opengraph app id and the webpage to extract the data

let og = require('opengraph-io')({
    appId: 'abcxyz', //app id from opengraph-io
    cacheOk: true, // If a cached result is available, use it for quickness
    useProxy: false,  // Proxies help avoid being blocked and can bypass capchas
    maxCacheAge: 432000000, // The maximum cache age to accept
    acceptLang: 'en-US,en;q=0.9', // Language to present to the site. 
    fullRender: false // This will cause JS to execute when rendering to deal with JS dependant sites
});
function scrapeMetaData(webpage) {
    return new Promise((resolve, reject) => {
        return og.getSiteInfo(webpage)
            .then(function(metadata){
                resolve(metadata);
            }).catch((err) => {
                reject(err);
            });
    });
}

Hope you enjoyed this tutorial in learning how to extract metadata from webpages.

Leave a Reply