Copyright Protection and Risk Assessment Based on Information Extraction and Machine Learning: The Case of Online Literary Works

Xudong Lin

doi:10.12694/scpe.v25i5.3002

Authors

Xudong Lin Educational and Scientific Institute of International Relations, Taras Shevchenko National University of Kyiv, Kyiv, Ukraine，01033

DOI:

https://doi.org/10.12694/scpe.v25i5.3002

Keywords:

risk assessment, Copyright Protection, information extraction

Abstract

With the proliferation of digital platforms, the dissemination of literary works has encountered unprecedented challenges, particularly concerning copyright infringement and unauthorized use. This study introduces a comprehensive framework for copyright protection and risk assessment, specifically tailored to online literary works. The framework employs advanced CNN based information extraction (IE) techniques coupled with machine learning (ML) algorithms to identify, classify, and protect literary content against copyright violations. Firstly, we delineate a novel CNN-Decision tree-based IE methodology that systematically harvests metadata and textual content from various online repositories. This process is designed to detect and index online literary works, extracting pertinent features such as authorship, publication date, and textual patterns. Following the extraction, the study utilizes natural language processing (NLP) to analyze and compare content, pinpointing potential instances of copyright infringement by identifying significant overlaps and stylistic similarities with registered works. Subsequently, we introduce a risk assessment model developed through supervised machine learning. This model is trained on a labelled dataset comprising instances of both copyrighted and non-copyrighted works, along with known cases of copyright infringement. By analyzing the extracted features, the model assesses the probability of infringement, categorizing risks into high, medium, and low categories. This stratification allows stakeholders to prioritize enforcement actions and resources efficiently. The study further explores the implementation of various ML algorithms, including decision trees, support vector machines, and neural networks, to determine the most effective approach for copyright protection in the literary domain. We evaluate the models based on accuracy, precision, recall, and F1-score metrics, emphasizing their capacity to generalize and operate in dynamic, real-world environments.

Copyright Protection and Risk Assessment Based on Information Extraction and Machine Learning: The Case of Online Literary Works

Authors

DOI:

Keywords:

Abstract

Downloads

Published

Issue

Section

announcement

Indexed In

SUBMIT

Metrics

Journal Information