Extracting Linguistic Relations from Word Embeddings and Language Models

We wish to understand the meaning encoded by natural language in a symbolic and computable way, but natural language is complex and difficult to represent as such, and thus we attempt to understand linguistic relationships by exploring their representation within word embedding models and language models. While previous and SOTA approaches to natural language understanding (NLU) focus on end-to-end learning systems which generate a pipeline of encodings that are not explored or understood by the humans creating them, we directly analyzed and explored natural language embedding space as a step towards understanding natural language as a symbolic, self-referential system.

We went beyond previous SOTA in human interpretability of word embedding space through discovery of representation of grammar, abstract word classes, notions of abstraction, global relationships between words, etc., and began the development of algorithms for combining these findings to create usable systems for tasks such as bogus grammar detection, synonym and antonym classification, same-class terms generation, and word definition checking/generation. Future work will take advantage of these discoveries to translate natural language into a symbolic language that is simultaneously human- and machine-understandable, creating the possibility for conversational language that is unambiguous, machine interpretable, and able to act on data and objects of the real world.