I have a set of feature vectors for sentences I have obtained using:
with open(sys.argv[1]) as trainingSentences:
sentence2region2value = json.loads(trainingSentences.read())
train_wordlist = []
for sentence,locations in sentence2region2value.iteritems():
train_wordlist.append(" ".join(sentence_to_words(sentence, True)))
vectorizer = CountVectorizer(analyzer = "word",
tokenizer = None,
preprocessor = None,
stop_words = None,
max_features = 5000)
train_data_features = vectorizer.fit_transform(train_wordlist)
train_data_features = train_data_features.toarray()
I want to also add the label for all of these 492 feature vectors for a logistic regression. This "prediction" label is contained in the sentence2region2value
dictionary:
{sentence: Y
{parsedsentence: Z
{prediction: X,
location-values:{"Qatar": [32,221,31]},{"Dubai": [12,123,421]},.....}
Currently I am trying to use this:
for prediction in sentence2region2value["sentence"]["parsedsentence"].iteritems():
for i in train_data_features:
train_data_features[i] = np.append(train_data_features[i],np.array(prediction))
But it isn't working. Any ideas?
Aucun commentaire:
Enregistrer un commentaire