Welcome to the Knowledge Bases Hub. This Page will link out to all of our experiments and cases studies.
What are information hierarchies? Informational Hierarchies are a way to organize information so it sets a clear context, and convey the relationship between entities and the overall context. It is a clear signal to what information is most important and it can help clarity the proper context.
Hierarchies can be organized into folders, sub-folders, etc. This is a parent, child type of relationship.
Well-structured websites have a proper URL hierarchy that conveys the correct meaning, context, and relationship between entities.
Consider the difference between:
Example one implies that there are multiple events in multiple cities. Example two, only shows the city which does not mean much, and example three, conveys that there are multiple events in NYC.
The first example is the strongest. It has a ROOT, of Conferences. The ROOT is the main topic of the website. The second level of the hierarchy or SEED is the year and the final NODE is the City. Organizing it by year is much easier than by city, which is why year is on a higher level within the hierarchy.
The goal is to create a topical map that includes all of the relavent entities, within the proper contextual hierarchies so that the LLM can quickly identify the context. A great topical map with will give the LLM a lot of details to go on which will make it easier to answer detailed questions.
The Hierarchy Test will test these ideas out to see to what degree they make a difference with LLMs. We build two chatbots for this test. The chatbot have the same information and the main difference is the Hierarchy organization.
1) Bot 1: The bot is trained on the following pages:
2) Bot 2: The Bot is trained on a single Page
Documents: Context & Semantics
Does the way information within a document is organized matter? Semantics helps LLMs identify the main context of the document and the relationship of entities with each other. These relationships inform the LLM on the proper way to connect concepts and the topic which later help give rise to meaning.
Macro & Micro
The overall context of a document is set by the Marco and Micro Semitics. The Marco Context is the overarching topic of the document and is organized by the H1, H2, H3, H4 Tags. All these tags are in order of importance and set up the Marco Context. The H1 tag, is the title of the document and represents its overall Macro Context. The H2 tags are the main topics within the document and support the overall thesis. The H3 tags are sub-topics of the H2 tags.
Consider the difference between:
1. Marco Context: H1, H2, H3, H4
2. Micro Context: Definitions, questions, phrases, world order within each heading.
Well-written documents generally are well organized, and easy to follow, read and understand. They typically have a hierarchical structure which allows the reader to go deeper into a topic. Topics are laid out in a logical and coherent manner. Topics often have sub-topics and supporting information. Well-written documents are able to answer our questions.
Consider the question below. We broke it down into its elements so you can gain insight into how an LLM reads it. Answering questions like this is the promise of LLMs.
What are the best bikes [knowledge domain] for [functional word] short boys [contextual domain]
What are the most useful diets [knowledge domain] for [functional word] children with insomnia [contextual domain] for kids under six [contextual layers]?
We broke it down into its elements so you can gain insight into how an LLM reads it. If children is well defined or the document has contextual layers, then an LLM can answer a detailed question like this.
What is the overall structure of a good document?
Macro Content: H1 Title Tag
10% Summary: Extractive & Abstractive Summary
60% Main Topics: H2 Tags & Micro Context: Definitions, paragraphs, etc
30% Supplementary Context: H2 & H3 Subtopics, related topics, synonyms, antonyms, etc
We will test two articles that have the same information. Article 1, will have all of the attributes shared above. Article two will be missing most of the attributes above. Each article will be used to train a bot and we can all play around with the differences.
1) Bot 1: Doc has the following
H1 Title Tag
H2 & H3
2) Bot 2: Doc is designed to mirror a poorly structured articles
All tags converted to Paragraph Format
No supplementary content
No definitions or questions