Artificial Intelligence - Australian Case Studies

Document worth reading: “Short Text Topic Modeling Techniques, Applications, and Performance: A Survey”

Analyzing fast texts infers discriminative and coherent latent issues that could be a important and primary course of since many real-world features require semantic understanding of fast texts. Traditional prolonged textual content material matter modeling algorithms (e.g., PLSA and LDA) primarily based totally on phrase co-occurrences can’t treatment this draw back very properly since solely very restricted phrase co-occurrence information is on the market briefly texts. Therefore, fast textual content material matter modeling has already attracted rather a lot consideration from the machine learning evaluation neighborhood currently, which objectives at overcoming the difficulty of sparseness briefly texts. In this survey, we conduct a whole consider of various fast textual content material matter modeling methods proposed throughout the literature. We present three courses of methods primarily based totally on Dirichlet multinomial mixture, world phrase co-occurrences, and self-aggregation, with occasion of marketing consultant approaches in each class and analysis of their effectivity on quite a few duties. We develop the first full open-source library, often known as STTM, for use in Java that integrates all surveyed algorithms inside a unified interface, benchmark datasets, to facilitate the enlargement of newest methods on this evaluation self-discipline. Finally, we think about these state-of-the-art methods on many real-world datasets and consider their effectivity in direction of one another and versus prolonged textual content material matter modeling algorithm. Short Text Topic Modeling Techniques, Applications, and Performance: A Survey