Preview

Journal of Instrument Engineering

Advanced search

Formation of the core of documents in Internet monitoring systems under resource constraints

https://doi.org/10.17586/0021-3454-2022-65-11-826-832

Abstract

The features of development of open-type Internet monitoring systems with an unlimited number of sources in conditions of a limited amount of data storage systems are considered. The purpose of the work is to solve the problem of forming a set of documents of the minimum required size (the core of documents) that meets the requirements of representativeness and variability of topics when monitoring the Internet. To formalize and solve the problem, a set-theoretic model of the document core is developed. The proposed approach is distinguished by the use of a preemptive algorithm that supports the availability of only relevant documents in the database within the available volume of the data storage system. The results of an experiment using real data confirming the applicability of the developed model are presented. The proposed approach can be used in a number of practical tasks, in particular for searching the Internet for information (documents, pages) for which there is no a priori information needed for keyword search.

About the Authors

S. V. Kuleshov
St. Petersburg Federal Research Center of the RAS
Russian Federation

Sergey V. Kuleshov Dr. Sci., Professor; St. Petersburg Institute for Informatics and Automation of the RAS, Research Automation Laboratory; Chief Researcher

St. Petersburg



A. A. Zaytseva
St. Petersburg Federal Research Center of the RAS
Russian Federation

Alexandra A. Zaytseva PhD; St. Petersburg Institute for Informatics and Automation of the RAS, Research Automation Laboratory; Senior Researcher

St. Petersburg



A. Yu. Aksenov
St. Petersburg Federal Research Center of the RAS
Russian Federation

Alexey Yu. Aksenov PhD; St. Petersburg Institute for Informatics and Automation of the RAS, Research Automation Laboratory; Senior Researcher

St. Petersburg



References

1. Zachlod C., Samuel O., Ochsner A., & Werthmüller S. Journal of Business Research, 2022, vol. 144, рр. 1064–1076, DOI: 10.1016/j.jbusres.2022.02.016.

2. Fink C., Toivonen T., Correia R. A., & Di Minin E. Applied Geography, 2021, рр. 134, DOI: 10.1016/j.apgeog.2021.102505.

3. Han H., Wang C., Zhao Y., Shu M., Wang W., & Min Y. World Wide Web, 2022, no. 3(25), pp. 1169–1195, DOI: 10.1007/s11280-022-01031-4.

4. Krewinkel A., Sünkler S., Lewandowski D. et al. Food Control, 2016, vol. 61, рр. 204–212, DOI: 10.1016/j.foodcont.2015.09.039.

5. Beliaevskii K.O. Peter the Great St. Petersburg Polytechnic University. Computing, Telecommunications and Control, 2019, no. 4(12), pp. 97–110. (in Russ.)

6. Puzak T.R. Analysis of Cache Replacement-Algorithms, Doctor’s thesis, 1985.

7. Wilson P.R. et al. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 1995, vol. 986, рр. 1–116.

8. Laliwala Z., Shaikh A. Web Crawling and Data Mining with Apache Nutch., Packt Publishing, 2013.

9. Nasraoui O. Computer Science, 2008, DOI:10.1145/1540276.1540281.

10. Van den Broucke S., Baesens B. From Web Scraping to Web Crawling. Practical Web Scraping for Data Science, Apress – Berkeley, CA, 2018, рр. 155–172.

11. Alkalbani A.M., Hussain W. & Kim J.Y. IEEE Access, 2019, vol. 7, рр. 128213–128223, DOI: 10.1109/ACCESS.2019.2939543.

12. Wu Z., Cai Z., Tang, X., Xu Y., & Deng T. Journal of Parallel and Distributed Computing, 2022, vol. 166, рр. 1–14, DOI:10.1016/j.jpdc.2022.04.008.

13. Zaitseva A.A., Kuleshov S.V., Mikhailov S.N. Trudy SPIIRAN (SPIIRAS Proceedings), 2014, no. 37, pp. 144—155. (in Russ.)

14. Kuleshov S.V., Zaytseva A.A., Levashkin S.P. Informatization and communication, 2020, no. 5, pp. 22–28. (in Russ.)

15. Kuleshov S., Zaytseva A., Aksenov A. Systems Applications in Software Engineering. CoMeSySo 2019. Advances in Intelligent Systems and Computing, 2019, vol. 1046, рр. 7–26, DOI 10.1007/978-3-030-30329-7_26.


Review

For citations:


Kuleshov S.V., Zaytseva A.A., Aksenov A.Yu. Formation of the core of documents in Internet monitoring systems under resource constraints. Journal of Instrument Engineering. 2022;65(11):826-832. (In Russ.) https://doi.org/10.17586/0021-3454-2022-65-11-826-832

Views: 11


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 0021-3454 (Print)
ISSN 2500-0381 (Online)