Analysis of Statistical Characteristics of Artificially Generated Texts
https://doi.org/10.17586/0021-3454-2024-67-11-958-968
Abstract
A new trend is considered, namely, the formation of content using artificial intelligence tools and technologies. Active implementation of artificial intelligence technologies for data generation leads to an increase in the share of artificially generated data that must be identified automatically to prevent errors (unreliability, misleading). Approaches to identifying text data created using neural network technologies are proposed, including heuristic rules based on the criterion of dependence of the abstract volume on the abstracting threshold, which allows for automatic evaluation of text documents in monitoring and search systems when processing large volumes of unstructured data. The obtained results lay the technological basis for the implementation of a wide range of practical solutions to ensure intellectual support for the collective behavior of participants in human-machine communities through the development of theoretical and technological foundations for processing unstructured data.
About the Authors
S. V. KuleshovRussian Federation
Sergey V. Kuleshov — Dr. Sci., Professor; St. Petersburg Institute for Informatics and Automation of the RAS, Laboratory of Automation of Scientific Research, Chief Researcher
A. A. Zaytseva
Russian Federation
Alexandra A. Zaytseva — PhD; St. Petersburg Institute for Informatics and Automation of the RAS, Laboratory of Automation of Scientific Research, Senior Researcher
A. Yu. Aksenov
Russian Federation
Alexey Yu. Aksenov — PhD; St. Petersburg Institute for Informatics and Automation of the RAS, Laboratory of
Automation of Scientific Research, Senior Researcher
References
1. https://www.fontanka.ru/2023/11/14/72913286/. (in Russ.)
2. Fang X., Che Sh., Mao M., Zhang H., Zhao M., Zhao X. Sci. Rep., 2024, no. 1(14), pp. 5224, doi: 10.1038/s41598-024-55686-2.
3. Chen Ch., Fu J., Lyu L. arXiv:2303.01325v3, 27 Dec. 2023, https://doi.org/10.48550/arXiv.2303.01325.
4. Wahle J.Ph., Ruas T., Mohammad S.M., Meuschke N., Gipp B. Proc. of 2023 ACM/IEEE Joint Conf. on Digital Libraries (JCDL 2023), Mexico, Santa Fe, June 2023, рр. 282–284.
5. https://doi.org/10.48550/arXiv.2307.07146.
6. Gragnaniello D., Marra F., Verdoliva L. Advances in Computer Vision and Pattern Recognition, 2022, рр. 191–212.
7. Xi Z., Wenmin H., Kangkang W., Weiqi L., Peijia Zh. Proc. of 2023 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Taiwan, Taipei, November 2023, рр. 1463–1470.
8. https://doi.org/10.48550/arXiv.2306.15666.
9. Joo-Wha H., Fischer K., Ha Y., Zeng Y. Computers in Human Behavior, 2022, vol. 131, art. no. 107239.
10. https://doi.org/10.48550/arXiv.2303.04226.
11. https://doi.org/10.48550/arXiv.2304.06632.
12. Ruchika L., Priyanka Bh., Neha V., Anshika J. Intern. J. of Creative Research Thoughts (IJCRT), 2023, no. 10(11), pp. d784–d789.
13. Zhengyuan J., Jinghuai Zh., Neil Zh.G. Proc. of the 2023 ACM SIGSAC Conf. on Computer and Communications Security (CCS '23), Denmark, Copenhagen, November 2023, рр. 1168–1181.
14. Elkhatat A., Elsaid Kh., Almeer S. Intern. J. for Educational Integrity, 2023, vol. 19, рр. 17.
15. Elkhatat A.M. Intern. J. for Educational Integrity, 2023, vol. 19, рр. 15, https://doi.org/10.1007/s40979-023-00137-0.
16. Otterbacher J. Patterns, 2023, no. 7(4), pp. 100796.
17. Pengyu W., Linyang K.R., Botian J., Dong Zh., Xipeng Q. Proc. of the 2023 Conf. on Empirical Methods in Natural Language Processing 2023, Singapore, December 2023, рр. 1144–1156.
18. Price G. Sakellarios M. Intern. J. of Teaching, Learning and Education, 2023, vol. 2, рр. 31–38.
19. Qu Y., Liu P., Song W., Liu L., Cheng M. IEEE 10th Intern. Conf. on Electronics Information and Emergency Communication (ICEIEC), China, Beijing, July 2020, рр. 323–326.
20. https://arxiv.org/abs/2010.02307.
21. https://habr.com/ru/articles/599673/. (in Russ.)
22. Ackley D., Hinton G., Sejnowski T. Cognitive Science, 1985, no. 1(9), pp. 147–169.
23. OpenAI Codex, https://openai.com/blog/openai-codex.
24. GPT-4 Technical Report. OpenAI, https://cdn.openai.com/papers/gpt-4.pdf.
25. GPTZero, https://gptzero.me/technology.
26. Chaka C. Journal of Applied Learning and Teaching, 2023, no. 2(6), https://doi.org/10.37074/jalt.2023.6.2.12.
27. Yang X., Cheng W., Petzold L., Wang W.Y., Chen H. ArXiv, abs/2305.17359, https://www.semanticscholar.org/paper/DNA-GPT%3A-Divergent-N-Gram-Analysis-for-Detection-of-Yang-Cheng/08145978da4c8912f4a05444a6bbf048778dc4af.
28. Kuleshov S.V., Zaytseva A.A., Markov S.V. Intellectual Technologies on Transport, 2015, no. 4, pp. 40–45. (in Russ.)
29. https://arxiv.org/abs/2310.06825
30.
31.
Review
For citations:
Kuleshov S.V., Zaytseva A.A., Aksenov A.Yu. Analysis of Statistical Characteristics of Artificially Generated Texts. Journal of Instrument Engineering. 2024;67(11):958-968. (In Russ.) https://doi.org/10.17586/0021-3454-2024-67-11-958-968