同义与上下位关系挖掘
Last updated
Last updated
关键词: synonym/synonymous/aliase (同义词), antonym (反义词), hypernym/hyperonym/hypernymy (上位词), hyponym/hyponymy (下位词) synonym/hypernym/hypernym-hyponym extraction (抽取)/detection (检测)/discovery (发现)/identification (识别)/generation (生成)
因为上位和下位关系是可以通过调整顺序互换的, 文献中一般使用 hypernym. (A practical reason to prefer hyperonym is that hypernym is in its spoken form hard to distinguish from hyponym in most dialects of English.)
smallwat3r/synonym: CLI tool to find synonyms in 15 different languages.
基于 https://thesaurus.altervista.org/
The Power of WordNet and How to Use It in Python - XRDSXRDS
利用 WordNet 查找 Synonym/Hypernym/Hyponym 的方法
Rawan N. Al-Matham and Hend S. Al-Khalifa. “SynoExtractor: A Novel Pipeline for Arabic Synonym Extraction Using Word2Vec Word Embeddings” Complexity(2021): n. pag.
Prathyusha Senthil Kumar et al. “Mickey Mouse is not a Phrase: Improving Relevance in E-Commerce with Multiword Expressions” Proceedings of the 10th Workshop on Multiword Expressions (MWE)(2014): n. pag.
Atzori M, Balloccu S. Fully-unsupervised embeddings-based hypernym discovery[J]. Information, 2020, 11(5): 268.
相关工作介绍比较详细;
博客
How to Build a Smart Synonyms Model | by Patrick O'Neill | Kensho Blog
基于 Wikipedia 重定向挖掘同义词; 有 Kaggle 代码;
kdwd_aliases_and_disambiguation | Kaggle
关联的 Kaggle 代码;
Kensho Derived Wikimedia Dataset | Kaggle
关联的 Wikipedia 数据
Introducing the Kensho Derived Wikimedia Dataset | by Gabriel Altay | Kensho Blog
Wikipedia 数据解析方法; 介绍如何将原始 Wikipedia 数据解析成 Kensho 版本的数据; 有 Kaggle 代码;
论文
Cheng T, Lauw H W, Paparizos S. Entity synonyms for structured web search[J]. IEEE transactions on knowledge and data engineering, 2011, 24(10): 1862-1875.
微软; Click Similarity (ClickSim)
Cheng T, Lauw H W, Paparizos S. Fuzzy matching of web queries to structured data[C]//2010 IEEE 26th International Conference on Data Engineering (ICDE 2010). IEEE, 2010: 713-716.
最早提出 ClickSim 的论文;
Turney P D. Mining the web for synonyms: PMI-IR versus LSA on TOEFL[C]//European conference on machine learning. Springer, Berlin, Heidelberg, 2001: 491-502.
Document Similarity (DocSim)
Chakrabarti K, Chaudhuri S, Cheng T, et al. A framework for robust discovery of entity synonyms[C]//Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. 2012: 1384-1392.
微软; 实体同义词 (entity synonyms); 基于点击数据; 垂类搜索领域 (电商/视频); 如何在垂搜中使用同义词; 提出 Pseudo Document Similarity (PseudoDocSim, 改进 ClickSim 和 DocSim) 和 Query Context Similarit (QCSim, 弥补 ClickSim 和 DocSim 的缺陷) 两种相似度计算方法;
资源
smallwat3r/synonym: CLI tool to find synonyms in 15 different languages.
一个 Linux 命令行工具, 通过调用 Thesaurus 提供的 API 返回同义词;