Seminár Ústavu informatiky


Anomaly Searching in Text Sequences

Mgr. Abdulwahed Faraj Ramadan Almarimi a doc. RNDr. Gabriela Andrejková, CSc.
Ústav informatiky, PF UPJŠ

18. mája 2016 (streda) o 12:55
SA1C03 (P/03), PF UPJŠ


An analysis of some text if authors are not known is still an interesting problem and it could be done using methods of data analysis and data mining, and using structural analysis. In the paper, it is presented a system of modified Self-Organizing Maps working on probabilistic sequences built from a text. The sequences were built on letters and on words as n-grams, 1<n< 5. The system is trained to input sequences and after the training it determines text parts with anomalies using a cumulative error and a complex analysis. In tested texts the system was successful, it covered a composition of texts.