pretrained word2vec

के साथ Seq2seq एम्बेडिंग प्रारंभ करना मुझे pretrained word2vec के साथ tensorflow seq2seq कार्यान्वयन प्रारंभ करने में रूचि है।pretrained word2vec

मैंने कोड देखा है। ऐसा लगता है कि एम्बेडिंग

with tf.variable_scope(scope or "embedding_attention_decoder"): 
with tf.device("/cpu:0"): 
embedding = tf.get_variable("embedding", [num_symbols, cell.input_size])

मैं इसे pretrained word2vec के साथ प्रारंभ करने के लिए कैसे बदलूं ??

स्रोत

2015-11-22 skw

टोकनिंग के लिए पूर्व-प्रशिक्षित word2vec मॉडल का उपयोग करने के लिए आप tensorflow/models/rnn/translate/data_utils.py में टोकननाइज़र मौजूद कर सकते हैं। लाइनों data_utils.py की 187-190:

if tokenizer: 
    words = tokenizer(sentence) 
else: 
    words = basic_tokenizer(sentence)

उपयोग basic_tokenizer। आप tokenizer विधि लिख सकते हैं जो वाक्यों को टोकन करने के लिए पूर्व-प्रशिक्षित word2vec मॉडल का उपयोग करता है।

स्रोत

2015-11-27 00:40:25

मुझे लगता है कि आपको मेलिंग सूची में आपका जवाब मिल गया है, लेकिन मैं इसे यहां पोस्टरिटी के लिए डाल रहा हूं।

https://groups.google.com/a/tensorflow.org/forum/#!topic/discuss/bH6S98NpIJE

आप यह बेतरतीब ढंग से और बाद में प्रारंभ कर सकते हैं: session.run (embedding.assign (my_word2vec_matrix))

यह init मूल्यों पर आ जाएगी।

ऐसा लगता है कि यह मेरे लिए काम करता है। मेरा मानना है कि मूल्यों को निर्धारित रखने के लिए trainable=False आवश्यक है?

# load word2vec model (say from gensim) 
model = load_model(FILENAME, binary=True) 

# embedding matrix 
X = model.syn0 
print(type(X)) # numpy.ndarray 
print(X.shape) # (vocab_size, embedding_dim) 

# start interactive session 
sess = tf.InteractiveSession() 

# set embeddings 
embeddings = tf.Variable(tf.random_uniform(X.shape, minval=-0.1, maxval=0.1), trainable=False) 

# initialize 
sess.run(tf.initialize_all_variables()) 

# override inits 
sess.run(embeddings.assign(X))

स्रोत

2015-11-29 02:47:05 tokestermw

अनुक्रमणिका के बारे में क्या: model.index2word? आप इसे tensorflow कैसे पास करते हैं? – vgoklani

pretrained word2vec

उत्तर

संबंधित मुद्दे