NEWS

Change in v0.6.0

Upgrade textmodel_doc2vec to train the distributed memory (DM) and distributed bag-of-word (DBOW) models.
Add as.textmodel_doc2vec() to create document vectors as weighted average of word vectors.
Add layer to as.matrix() to choose between word or document vectors.
normalize is now defunct in textmodel_word2vec().

Change in v0.5.1

Add normalize to textmodel_doc2vec() and pass it to as.matrix().
Add weights to textmodel_doc2vec() to adjust the salience of words in the document vectors.
Add include_data to textmodel_word2vec() to save the original tokens object.

Changes in v0.5.0

Add the model argument to textmodel_word2vec() to update existing models.
The normalize argument is moved from textmodel_word2vec() to as.matrix(). The original argument is deprecated and set to FALSE by default.
Remove weights().
Improve the structure of C++ code.

Changes in v0.4.0

Add the tolower argument and set to TRUE to lower-case tokens.
Allow x to be quanteda’s tokens_xptr object to enhance efficiency.

Changes in v0.3.0

Save docvars in the textmodel_doc2vec objects.
Set zero for empty documents in the textmodel_doc2vec objects.
Add probability() to compute probability of words.

Changes in v0.2.0

Rename word2vec(), doc2vec() and lsa() to textmodel_word2vec(), textmodel_doc2vec() and textmodel_lsa() respectively.
Simplify the C++ code to make maintenance easier.
Add normalize to word2vec to disable or enable word vector normalization.
Add weights() to extract back-propagation weights.
Make analogy() to convert a formula to named character vector.
Improve the stability of word2vec() when verbose = TRUE.

Changes in v0.1.0

Fork https://github.com/bnosac/word2vec and change the package name to wordvector.
Replace a list of character with quanteda’s tokens object as an input object.
Recreate word2vec() with new argument names and object structures.
Create lda() to train word vectors using Latent Semantic Analysis.
Add similarity() and analogy() functions using proxyC.
Add data_corpus_news2014 that contain 20,000 news summaries as package data.