To find categories in a document Selko uses text classification methods. A data model can be trained using a set of example sentences, which can then be used to find similar sentences in new documents.
Usually companies have very little domain specific information about unseen documents. Machine learning models usually require large amounts of training data for reasonable predictions. Selko Analytics uses transfer learning to overcome this limitation and can classify text using only 100 labels per category.
For best results, Selko uses transfer learning, clever fine-tuning of the transferred model, followed by predictions on unseen text. Predictions can be multi-class, multi-label, multi-class multi-label or hierarchical classification.