Anomaly Searching in Text Sequences
Mgr. Abdulwahed Faraj Ramadan Almarimi a doc. RNDr. Gabriela Andrejková, CSc.
Ústav informatiky, PF UPJŠ
An analysis of some text if authors are not known is still an interesting problem and it could be done using methods of data analysis and data mining, and using structural analysis. In the paper, it is presented a system of modified Self-Organizing Maps working on probabilistic sequences built from a text. The sequences were built on letters and on words as n-grams, 1<n< 5. The system is trained to input sequences and after the training it determines text parts with anomalies using a cumulative error and a complex analysis. In tested texts the system was successful, it covered a composition of texts.