I have data set like below,(just one row from the dataset and some columns)
id summary description01 decription02 recommendation connections total-experience_in_months
1 Experienced and seasoned Data warehouse IT professional with 10 Years of experience looking for a Technical Architecture Role,process optimization involved in technical planning 5 524 30
2 Working as a Technical Consultant at SAS Institute. Specialties: Business Intelligence, Data Warehousing, Data modeling, SAS Platform Administration,Worked for Discover Cards financial service for a transition 5 4000 40
I want to extract features from the text columns. Below I am using tf-idf approach
This is what I am tying
from sklearn.feature_extraction.text import TfidfVectorizer
tf = TfidfVectorizer(analyzer='word', ngram_range=(1,3), min_df = 0, stop_words = 'english')
# Calculating tf-idf for summary column(only for single text)
tfidf_matrix = tf.fit_transform(raw_data['summary'][:1])
feature_names = tf.get_feature_names()
print len(feature_names)
feature_names[50:70]
dense = tfidf_matrix.todense()
Now I got dense matrix representation for my first textual column called summary.(only for first text data)
My question is how do I combine this with my rest of the features from my dataset so that I could use it for my model.
Do I need to combine all textual column in single column and then calculate the tf-idf values or I need to calculate for each textual column separately.
Refereed below link:
Aucun commentaire:
Enregistrer un commentaire