What is Web Mining?

The process of utilizing data mining techniques along with special algorithms to extract information directly from web content, web documents, web services, server logs and hyperlinks is known as web mining. Web based applications can be understood by discovering usage patterns from web data. Web mining uses traditional data mining techniques and methodologies to automatically extract information from web documents. When you use web mining, your organization can obtain both unstructured and organized data from page content, server logs, websites and browser activities to name a few.

Types of information that web mining can discover

There are three basic types of information that can be discovered through web mining:

  1. Web graphs ( Data extracted from the links between people, pages and other data)
  2. Web content ( Data extracted from inside web documents and pages)
  3. Web activity ( Data extracted from web browsers and server logs)

Web mining can give you quick access to competitive intelligence, pricing analysis, business intelligence, brand reputation and brand popularity to name a few.

How is web content extracted?

Web content is usually extracted in the following four steps:

  1. Content is collected from the web
  2. Useable data is extracted from formatted data (HTML, PDF etc)
  3. Data is classified, rated clustered, filtered and sorted
  4. The results of the analysis is turned into useful information like a search index or report

How is web mining is different from data mining?

Web mining differs from data mining in the following ways:

  • Scale: Processing 1 million records from a database would be a huge task in traditional data mining. But with web mining, even 10 million pages would not be a big number.
  • Access: Unlike the data mining of corporate information, where the data is private and requires access, in web mining, the data is public and would very rarely require access rights.
  • Structure: In traditional data mining, information is obtained from a database which would provide some level of explicit structure. Web mining on the other hand is the processing of semi-structured and unstructured data from web pages. The web pages are often obscured by HTML markup.

Importance of web mining in pharmaceutical research

The web can give you unlimited access to information in the field of biology and chemistry. This is a boon for pharmaceutical companies looking for chemical /biological databases. Extracting dynamic or static data from these heterogeneous databases can be difficult, as you would need customs-built search engines, along with indexing mechanisms. If your pharma company opts for web mining on the other hand, you can get immediate access to any information that you need.

