同义与上下位关系挖掘
Last updated
Last updated
关键词: synonym/synonymous/aliase (同义词), antonym (反义词), hypernym/hyperonym/hypernymy (上位词), hyponym/hyponymy (下位词) synonym/hypernym/hypernym-hyponym extraction (抽取)/detection (检测)/discovery (发现)/identification (识别)/generation (生成)
因为上位和下位关系是可以通过调整顺序互换的, 文献中一般使用 hypernym. (A practical reason to prefer hyperonym is that hypernym is in its spoken form hard to distinguish from hyponym in most dialects of English.)
基于 https://thesaurus.altervista.org/
基于语义共现网络的节点相似度
利用 WordNet 查找 Synonym/Hypernym/Hyponym 的方法
Rawan N. Al-Matham and Hend S. Al-Khalifa. “SynoExtractor: A Novel Pipeline for Arabic Synonym Extraction Using Word2Vec Word Embeddings” Complexity(2021): n. pag.
Prathyusha Senthil Kumar et al. “Mickey Mouse is not a Phrase: Improving Relevance in E-Commerce with Multiword Expressions” Proceedings of the 10th Workshop on Multiword Expressions (MWE)(2014): n. pag.
Atzori M, Balloccu S. Fully-unsupervised embeddings-based hypernym discovery[J]. Information, 2020, 11(5): 268.
相关工作介绍比较详细;
博客
基于 Wikipedia 重定向挖掘同义词; 有 Kaggle 代码;
关联的 Kaggle 代码;
关联的 Wikipedia 数据
Wikipedia 数据解析方法; 介绍如何将原始 Wikipedia 数据解析成 Kensho 版本的数据; 有 Kaggle 代码;
论文
Cheng T, Lauw H W, Paparizos S. Entity synonyms for structured web search[J]. IEEE transactions on knowledge and data engineering, 2011, 24(10): 1862-1875.
微软; Click Similarity (ClickSim)
Cheng T, Lauw H W, Paparizos S. Fuzzy matching of web queries to structured data[C]//2010 IEEE 26th International Conference on Data Engineering (ICDE 2010). IEEE, 2010: 713-716.
最早提出 ClickSim 的论文;
Turney P D. Mining the web for synonyms: PMI-IR versus LSA on TOEFL[C]//European conference on machine learning. Springer, Berlin, Heidelberg, 2001: 491-502.
Document Similarity (DocSim)
Chakrabarti K, Chaudhuri S, Cheng T, et al. A framework for robust discovery of entity synonyms[C]//Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. 2012: 1384-1392.
微软; 实体同义词 (entity synonyms); 基于点击数据; 垂类搜索领域 (电商/视频); 如何在垂搜中使用同义词; 提出 Pseudo Document Similarity (PseudoDocSim, 改进 ClickSim 和 DocSim) 和 Query Context Similarit (QCSim, 弥补 ClickSim 和 DocSim 的缺陷) 两种相似度计算方法;
资源
一个 Linux 命令行工具, 通过调用 提供的 API 返回同义词;