Predictive Intelligence - several questions about the way it works
						
					
					
				
			
		
	
			
	
	
	
	
	
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-04-2024 03:04 AM - edited 01-07-2025 02:14 AM
Hi everyone,
I am used to write Python scripts to perform some NLP tasks. I am currently trying Predictive Intelligence, and I have several questions concerning the way it works.
Preprocessing
How are handled the following normalization steps ?
- Lowercasing
- Accentuated characters
- Special characters
- Stemming/Lemmatization
- When dealing with multiple input text fields (namely: Short Description, Description, Additional Informations), it happens that these fields contain the exact same content. Is there a way to add condition such as: if the aformentioned fields contain the same piece of text, then PI only takes into account one of them ?
- Concerning Stopwords, is it a good practice to add named entities, like names of persons or locations ?
- With the deletion of Word Corpus for Similarity and Clustering, what is the point of having different embedding techniques ? (Universal Sentence Encoder for Similarity/Clustering and Doc2Vec/TF-IDF/GloVe for Classification)
Machine Learning Pipelines
Can you confirm that models used are still these ones:
- Logistic Regression, Decision Trees, and Random Forests for Classification.
- k-Nearest Neighbors (k-NN) and Cosine Similarity for Similarity.
- k-Means, DBSCAN and HDBSCAN for Clustering.
- Linear Regression and Support Vector Regression (SVR) for Regression
For Classification, I want to implement incident categorization, to do so I want to predict Category and Subcategory. Is it possible to predict both at the same time. Should I create 2 different models ?
For Similarity, I can only apply filters for the Table, and not for the Test Table. How can I bypass this problem ? Should I create a Database View to use in the Test Table ?
For Clustering, the default model is kMeans, but I can't find any field to specify the number of clusters. Is it automatically done by PI, if yes, how ?
Thank you in advance !
Regards,
Julien
- Labels:
- 
						
							
		
			Predictive Intelligence
