Italian Word Embeddings

Two sets of word embeddings trained starting from two different corpora:

  • itWaC: billion word corpus constructed from the Web limiting the crawl to the .it domain and using medium-frequency words from the Repubblica corpus and basic Italian vocabulary lists as seeds.
  • Twitter: 46.935.207 tweets.

The word embeddings are 128-sized and are generated with word2vec.

More info: Italian Word Embeddings