Copyright Protection and Risk Assessment Based on Information Extraction and Machine Learning: The Case of Online Literary Works

Main Article Content

Xudong Lin

Abstract

With the proliferation of digital platforms, the dissemination of literary works has encountered unprecedented challenges, particularly concerning copyright infringement and unauthorized use. This study introduces a comprehensive framework for copyright protection and risk assessment, specifically tailored to online literary works. The framework employs advanced CNN based information extraction (IE) techniques coupled with machine learning (ML) algorithms to identify, classify, and protect literary content against copyright violations. Firstly, we delineate a novel CNN-Decision tree-based IE methodology that systematically harvests metadata and textual content from various online repositories. This process is designed to detect and index online literary works, extracting pertinent features such as authorship, publication date, and textual patterns. Following the extraction, the study utilizes natural language processing (NLP) to analyze and compare content, pinpointing potential instances of copyright infringement by identifying significant overlaps and stylistic similarities with registered works. Subsequently, we introduce a risk assessment model developed through supervised machine learning. This model is trained on a labelled dataset comprising instances of both copyrighted and non-copyrighted works, along with known cases of copyright infringement. By analyzing the extracted features, the model assesses the probability of infringement, categorizing risks into high, medium, and low categories. This stratification allows stakeholders to prioritize enforcement actions and resources efficiently. The study further explores the implementation of various ML algorithms, including decision trees, support vector machines, and neural networks, to determine the most effective approach for copyright protection in the literary domain. We evaluate the models based on accuracy, precision, recall, and F1-score metrics, emphasizing their capacity to generalize and operate in dynamic, real-world environments.

Article Details

Section
Special Issue - Evolutionary Computing for AI-Driven Security and Privacy: Advancing the state-of-the-art applications