A Study on Fast English Sentence Retrieval based on Simhash and Vector Space Model TF-IDF in an E-learning Environment

Main Article Content

Yuehua Li
Xinxin Guan

Abstract

With the rapid development of the digital information age on the Internet, information data on the Internet grows exponentially every day. In today’s online learning environment, fast retrieval of English sentences plays a crucial role in the teaching and learning of modern English. The current case-based Machine translation methods can perform in-depth Parsing on sentences, and only use similar instances in the original corpus for matching and replacement processing. However, there are still certain limitations in terms of retrieval speed and similarity calculation. The study proposes an improved Simhash algorithm, which introduces substitution cost for synonym replacement and combines Term Frequency-Inverse Document Frequency (TFIDF) weights with lexical weights for sentence-to-sentence similarity calculation. The results showed that the performance of the improved Simhash algorithm reached a maximum RI of 98.9%, an improvement of 1.4% compared to the traditional Simhash algorithm. The minimum misclassification rate of the improved algorithm was only 1.1%, a reduction of 1.4% compared to the traditional algorithm. The runtime of the improved Simhash algorithm was only 0.71s per sentence without processing synonyms and 1.82s with processing synonyms, while the runtime of the TF-IDF method alone was 71.82s and 98.11s in these two cases respectively. The improved Simhash algorithm, which combines TF-IDF weight, part of speech weight, and replacement cost, achieved an average accuracy of 92.87%, a recall rate of 88.7%, and an F1 Score of 92.87% in two calculations. This shows that the improved Simhash algorithm has high retrieval accuracy for fast retrieval of English sentences and shows excellent performance, providing a reliable technical support for the current English learning field.

Article Details

Section
Special Issue - Transformative Horizons: The Role of AI and Computers in Shaping Future Trends of Education