Analysis of Statistical Characteristics of Artificially Generated Texts

S. V. Kuleshov; A. A. Zaytseva; A. Yu. Aksenov

doi:10.17586/0021-3454-2024-67-11-958-968

Analysis of Statistical Characteristics of Artificially Generated Texts

S. V. Kuleshov, A. A. Zaytseva, A. Yu. Aksenov

https://doi.org/10.17586/0021-3454-2024-67-11-958-968

Full Text:

PDF (Rus)

Generate QR code

Abstract

A new trend is considered, namely, the formation of content using artificial intelligence tools and technologies. Active implementation of artificial intelligence technologies for data generation leads to an increase in the share of artificially generated data that must be identified automatically to prevent errors (unreliability, misleading). Approaches to identifying text data created using neural network technologies are proposed, including heuristic rules based on the criterion of dependence of the abstract volume on the abstracting threshold, which allows for automatic evaluation of text documents in monitoring and search systems when processing large volumes of unstructured data. The obtained results lay the technological basis for the implementation of a wide range of practical solutions to ensure intellectual support for the collective behavior of participants in human-machine communities through the development of theoretical and technological foundations for processing unstructured data.

Keywords

internet documents, artificial neural networks, large language model, Internet resources, artificial intelligence methods, data generation

About the Authors

S. V. Kuleshov

St. Petersburg Federal Research Center of the RAS
Russian Federation

Sergey V. Kuleshov — Dr. Sci., Professor; St. Petersburg Institute for Informatics and Automation of the RAS, Laboratory of Automation of Scientific Research, Chief Researcher

A. A. Zaytseva

St. Petersburg Federal Research Center of the RAS
Russian Federation

Alexandra A. Zaytseva — PhD; St. Petersburg Institute for Informatics and Automation of the RAS, Laboratory of Automation of Scientific Research, Senior Researcher

A. Yu. Aksenov

St. Petersburg Federal Research Center of the RAS
Russian Federation

Alexey Yu. Aksenov — PhD; St. Petersburg Institute for Informatics and Automation of the RAS, Laboratory of
Automation of Scientific Research, Senior Researcher

References

1. https://www.fontanka.ru/2023/11/14/72913286/. (in Russ.)

2. Fang X., Che Sh., Mao M., Zhang H., Zhao M., Zhao X. Sci. Rep., 2024, no. 1(14), pp. 5224, doi: 10.1038/s41598-024-55686-2.

3. Chen Ch., Fu J., Lyu L. arXiv:2303.01325v3, 27 Dec. 2023, https://doi.org/10.48550/arXiv.2303.01325.

4. Wahle J.Ph., Ruas T., Mohammad S.M., Meuschke N., Gipp B. Proc. of 2023 ACM/IEEE Joint Conf. on Digital Libraries (JCDL 2023), Mexico, Santa Fe, June 2023, рр. 282–284.

5. https://doi.org/10.48550/arXiv.2307.07146.

6. Gragnaniello D., Marra F., Verdoliva L. Advances in Computer Vision and Pattern Recognition, 2022, рр. 191–212.

7. Xi Z., Wenmin H., Kangkang W., Weiqi L., Peijia Zh. Proc. of 2023 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Taiwan, Taipei, November 2023, рр. 1463–1470.

8. https://doi.org/10.48550/arXiv.2306.15666.

9. Joo-Wha H., Fischer K., Ha Y., Zeng Y. Computers in Human Behavior, 2022, vol. 131, art. no. 107239.

10. https://doi.org/10.48550/arXiv.2303.04226.

11. https://doi.org/10.48550/arXiv.2304.06632.

12. Ruchika L., Priyanka Bh., Neha V., Anshika J. Intern. J. of Creative Research Thoughts (IJCRT), 2023, no. 10(11), pp. d784–d789.

13. Zhengyuan J., Jinghuai Zh., Neil Zh.G. Proc. of the 2023 ACM SIGSAC Conf. on Computer and Communications Security (CCS '23), Denmark, Copenhagen, November 2023, рр. 1168–1181.

14. Elkhatat A., Elsaid Kh., Almeer S. Intern. J. for Educational Integrity, 2023, vol. 19, рр. 17.

15. Elkhatat A.M. Intern. J. for Educational Integrity, 2023, vol. 19, рр. 15, https://doi.org/10.1007/s40979-023-00137-0.

16. Otterbacher J. Patterns, 2023, no. 7(4), pp. 100796.

17. Pengyu W., Linyang K.R., Botian J., Dong Zh., Xipeng Q. Proc. of the 2023 Conf. on Empirical Methods in Natural Language Processing 2023, Singapore, December 2023, рр. 1144–1156.

18. Price G. Sakellarios M. Intern. J. of Teaching, Learning and Education, 2023, vol. 2, рр. 31–38.

19. Qu Y., Liu P., Song W., Liu L., Cheng M. IEEE 10th Intern. Conf. on Electronics Information and Emergency Communication (ICEIEC), China, Beijing, July 2020, рр. 323–326.

20. https://arxiv.org/abs/2010.02307.

21. https://habr.com/ru/articles/599673/. (in Russ.)

22. Ackley D., Hinton G., Sejnowski T. Cognitive Science, 1985, no. 1(9), pp. 147–169.

23. OpenAI Codex, https://openai.com/blog/openai-codex.

24. GPT-4 Technical Report. OpenAI, https://cdn.openai.com/papers/gpt-4.pdf.

25. GPTZero, https://gptzero.me/technology.

26. Chaka C. Journal of Applied Learning and Teaching, 2023, no. 2(6), https://doi.org/10.37074/jalt.2023.6.2.12.

27. Yang X., Cheng W., Petzold L., Wang W.Y., Chen H. ArXiv, abs/2305.17359, https://www.semanticscholar.org/paper/DNA-GPT%3A-Divergent-N-Gram-Analysis-for-Detection-of-Yang-Cheng/08145978da4c8912f4a05444a6bbf048778dc4af.

28. Kuleshov S.V., Zaytseva A.A., Markov S.V. Intellectual Technologies on Transport, 2015, no. 4, pp. 40–45. (in Russ.)

29. https://arxiv.org/abs/2310.06825

30.

31.

Review

For citations:

Kuleshov S.V., Zaytseva A.A., Aksenov A.Yu. Analysis of Statistical Characteristics of Artificially Generated Texts. Journal of Instrument Engineering. 2024;67(11):958-968. (In Russ.) https://doi.org/10.17586/0021-3454-2024-67-11-958-968

This work is licensed under a Creative Commons Attribution 4.0 License.

ISSN 0021-3454 (Print)
ISSN 2500-0381 (Online)

Username
Password
	Remember me
Not a user? Register with this site Forgot your password?

User

Journal of Instrument Engineering

Analysis of Statistical Characteristics of Artificially Generated Texts

Full Text:

Abstract

Keywords

About the Authors

References

Review

For citations:

Cookies policy