同义与上下位关系挖掘
关键词: synonym/synonymous/aliase (同义词), antonym (反义词), hypernym/hyperonym/hypernymy (上位词), hyponym/hyponymy (下位词) synonym/hypernym/hypernym-hyponym extraction (抽取)/detection (检测)/discovery (发现)/identification (识别)/generation (生成)
因为上位和下位关系是可以通过调整顺序互换的, 文献中一般使用 hypernym. (A practical reason to prefer hyperonym is that hypernym is in its spoken form hard to distinguish from hyponym in most dialects of English.)
资源
smallwat3r/synonym: CLI tool to find synonyms in 15 different languages.
基于 https://thesaurus.altervista.org/
参考
The Power of WordNet and How to Use It in Python - XRDSXRDS
利用 WordNet 查找 Synonym/Hypernym/Hyponym 的方法
Rawan N. Al-Matham and Hend S. Al-Khalifa. “SynoExtractor: A Novel Pipeline for Arabic Synonym Extraction Using Word2Vec Word Embeddings” Complexity(2021): n. pag.
Prathyusha Senthil Kumar et al. “Mickey Mouse is not a Phrase: Improving Relevance in E-Commerce with Multiword Expressions” Proceedings of the 10th Workshop on Multiword Expressions (MWE)(2014): n. pag.
上下位
Atzori M, Balloccu S. Fully-unsupervised embeddings-based hypernym discovery[J]. Information, 2020, 11(5): 268.
相关工作介绍比较详细;
同义词
博客
How to Build a Smart Synonyms Model | by Patrick O'Neill | Kensho Blog
基于 Wikipedia 重定向挖掘同义词; 有 Kaggle 代码;
kdwd_aliases_and_disambiguation | Kaggle
关联的 Kaggle 代码;
Kensho Derived Wikimedia Dataset | Kaggle
关联的 Wikipedia 数据
Introducing the Kensho Derived Wikimedia Dataset | by Gabriel Altay | Kensho Blog
Wikipedia 数据解析方法; 介绍如何将原始 Wikipedia 数据解析成 Kensho 版本的数据; 有 Kaggle 代码;
论文
Cheng T, Lauw H W, Paparizos S. Entity synonyms for structured web search[J]. IEEE transactions on knowledge and data engineering, 2011, 24(10): 1862-1875.
微软; Click Similarity (ClickSim)
Cheng T, Lauw H W, Paparizos S. Fuzzy matching of web queries to structured data[C]//2010 IEEE 26th International Conference on Data Engineering (ICDE 2010). IEEE, 2010: 713-716.
最早提出 ClickSim 的论文;
Turney P D. Mining the web for synonyms: PMI-IR versus LSA on TOEFL[C]//European conference on machine learning. Springer, Berlin, Heidelberg, 2001: 491-502.
Document Similarity (DocSim)
Chakrabarti K, Chaudhuri S, Cheng T, et al. A framework for robust discovery of entity synonyms[C]//Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. 2012: 1384-1392.
微软; 实体同义词 (entity synonyms); 基于点击数据; 垂类搜索领域 (电商/视频); 如何在垂搜中使用同义词; 提出 Pseudo Document Similarity (PseudoDocSim, 改进 ClickSim 和 DocSim) 和 Query Context Similarit (QCSim, 弥补 ClickSim 和 DocSim 的缺陷) 两种相似度计算方法;
资源
smallwat3r/synonym: CLI tool to find synonyms in 15 different languages.
一个 Linux 命令行工具, 通过调用 Thesaurus 提供的 API 返回同义词;
Last updated