# 同义与上下位关系挖掘

![last modify](https://img.shields.io/static/v1?label=last%20modify\&message=2023-01-12%2015%3A42%3A31\&color=yellowgreen\&style=flat-square)

* [资源](#资源)
* [参考](#参考)
  * [上下位](#上下位)
  * [同义词](#同义词)

> *关键词:* *synonym/synonymous/aliase (同义词), antonym (反义词), hypernym/hyperonym/hypernymy (上位词), hyponym/hyponymy (下位词)*\
> \&#xNAN;*synonym/hypernym/hypernym-hyponym extraction (抽取)/detection (检测)/discovery (发现)/identification (识别)/generation (生成)*
>
> > 因为上位和下位关系是可以通过调整顺序互换的, 文献中一般使用 hypernym. (A practical reason to prefer hyperonym is that hypernym is in its spoken form hard to distinguish from hyponym in most dialects of English.)

## 资源

* [smallwat3r/synonym: CLI tool to find synonyms in 15 different languages.](https://github.com/smallwat3r/synonym)

  > 基于 <https://thesaurus.altervista.org/>
* [tigerchen52/synonym\_detection: Mining synonyms from unstructured and semi-structured data](https://github.com/tigerchen52/synonym_detection)

  > 基于语义共现网络的节点相似度

## 参考

* [The Power of WordNet and How to Use It in Python - XRDSXRDS](https://blog.xrds.acm.org/2017/07/power-wordnet-use-python/)

  > 利用 WordNet 查找 Synonym/Hypernym/Hyponym 的方法
* [How to Build a Smart Synonyms Model | by Patrick O'Neill | Kensho Blog](https://blog.kensho.com/how-to-build-a-smart-synonyms-model-1d525971a4ee)

  > [kdwd\_aliases\_and\_disambiguation | Kaggle](https://www.kaggle.com/code/kenshoresearch/kdwd-aliases-and-disambiguation/notebook)\
  > [Introducing the Kensho Derived Wikimedia Dataset | by Gabriel Altay | Kensho Blog](https://blog.kensho.com/announcing-the-kensho-derived-wikimedia-dataset-5d1197d72bcf)
* Rawan N. Al-Matham and Hend S. Al-Khalifa. “SynoExtractor: A Novel Pipeline for Arabic Synonym Extraction Using Word2Vec Word Embeddings” Complexity(2021): n. pag.
* Prathyusha Senthil Kumar et al. “Mickey Mouse is not a Phrase: Improving Relevance in E-Commerce with Multiword Expressions” Proceedings of the 10th Workshop on Multiword Expressions (MWE)(2014): n. pag.

### 上下位

* Atzori M, Balloccu S. Fully-unsupervised embeddings-based hypernym discovery\[J]. Information, 2020, 11(5): 268.

  > 相关工作介绍比较详细;

### 同义词

**博客**

* [How to Build a Smart Synonyms Model | by Patrick O'Neill | Kensho Blog](https://blog.kensho.com/how-to-build-a-smart-synonyms-model-1d525971a4ee)

  > 基于 Wikipedia 重定向挖掘同义词; 有 Kaggle 代码;

  * [kdwd\_aliases\_and\_disambiguation | Kaggle](https://www.kaggle.com/code/kenshoresearch/kdwd-aliases-and-disambiguation#Disambiguation-candidate-examples)

    > 关联的 Kaggle 代码;

    * [Kensho Derived Wikimedia Dataset | Kaggle](https://www.kaggle.com/datasets/kenshoresearch/kensho-derived-wikimedia-data)

      > 关联的 Wikipedia 数据
  * [Introducing the Kensho Derived Wikimedia Dataset | by Gabriel Altay | Kensho Blog](https://blog.kensho.com/announcing-the-kensho-derived-wikimedia-dataset-5d1197d72bcf)

    > Wikipedia 数据解析方法; 介绍如何将原始 Wikipedia 数据解析成 Kensho 版本的数据; 有 Kaggle 代码;

**论文**

* Cheng T, Lauw H W, Paparizos S. Entity synonyms for structured web search\[J]. IEEE transactions on knowledge and data engineering, 2011, 24(10): 1862-1875.

  > 微软; Click Similarity (ClickSim)

  * Cheng T, Lauw H W, Paparizos S. Fuzzy matching of web queries to structured data\[C]//2010 IEEE 26th International Conference on Data Engineering (ICDE 2010). IEEE, 2010: 713-716.

    > 最早提出 ClickSim 的论文;
* Turney P D. Mining the web for synonyms: PMI-IR versus LSA on TOEFL\[C]//European conference on machine learning. Springer, Berlin, Heidelberg, 2001: 491-502.

  > Document Similarity (DocSim)
* Chakrabarti K, Chaudhuri S, Cheng T, et al. A framework for robust discovery of entity synonyms\[C]//Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. 2012: 1384-1392.

  > 微软; 实体同义词 (entity synonyms); 基于点击数据; 垂类搜索领域 (电商/视频); 如何在垂搜中使用同义词;\
  > 提出 **Pseudo Document Similarity** (PseudoDocSim, 改进 ClickSim 和 DocSim) 和 **Query Context Similarit** (QCSim, 弥补 ClickSim 和 DocSim 的缺陷) 两种相似度计算方法;

**资源**

* [smallwat3r/synonym: CLI tool to find synonyms in 15 different languages.](https://github.com/smallwat3r/synonym)

  > 一个 Linux 命令行工具, 通过调用 [Thesaurus](https://thesaurus.altervista.org/) 提供的 API 返回同义词;
