SimilEx – Sentence Similarity

SimilEx is the first Italian dataset comprising 2,112 sentence pairs manually annotated for semantic similarity. 907 sentence pairs are further enriched with free-form, human-written explanations that justify the assigned similarity scores. The sentence pairs in SimilEx are derived from a collection of novels translated into Italian from the late 19th century. The dataset also includes the results of a stylistic analysis of the paired sentences and their corresponding explanations.

More info: SimilEx – Sentence Similarity