A Visual Webpage Information Extraction Framework For Competitive Intelligence System

Main Article Content

Zhiwei Zhang
Wenbo Qin
Haifeng Xu

Abstract

The extraction of webpage information is of paramount importance in the realm of competitive intelligence. This research is dedicated to the design and implementation of a visual webpage information extraction module within a competitive intelligence system, approached through the lens of research and development (R&D) technology and its practical applications. Initially, the study delineates the objectives and requirements for webpage information extraction, emphasizing the practical needs of competitive intelligence systems. By critically assessing the strengths and weaknesses of current theories and methodologies in webpage text information extraction, this paper introduces an innovative visual method for extracting webpage text information. Subsequently, the paper meticulously outlines the comprehensive architecture of the proposed module. Building upon this foundation, the study delves into the specifics of the extraction template, rule generation, optimization techniques, and the extraction algorithm pivotal to the process of visual webpage information extraction. The system’s effectiveness and practical utility are substantiated through a series of confirmatory experiments, the results of which are thoroughly analyzed. The findings affirm that the developed system adeptly fulfills the webpage information extraction needs of competitive intelligence systems, contributing significantly to the R&D efforts in which the authors are engaged.

Article Details

Section
Special Issue - Efficient Scalable Computing based on IoT and Cloud Computing