A Brief History of Chatbots
The first chatbot, ELIZA, was created in 1966 by Joesph Weizenbaum. However, it wasn't until the turn of the century that researchers started to revisit the development of social chatbots that could hold extended conversations with humans.
While in the past these systems were primarily used for academic or experimental purposes, their potential began to come into focus with the advent of voice assistants like Siri and Alexa in the 2010s.
Now, a decade later with ChatGPT, we have entered a new era of chatbots that are far more sophisticated, capable, and versatile than most of us could have imagined. They can be used for a variety of tasks, including writing poems, generating (mostly working) code, preparing you for a job interview, helping out with web accessibility, and so much more.
Chat Your Docs
Today we’ll take a look at how to use ChatGPT and similar language models to answer questions based on your documentation. This allows us to create a chatbot that can provide detailed, accurate responses to user queries, all while maintaining the dynamic and engaging nature of a conversation.
Imagine a chatbot that can answer any question about your company's policies, procedures, or products, based on your existing documentation. This means we no longer need to manage large datasets of question-and-answer pairs to anticipate every query from our users.
If all of this sounds complicated, don't worry. By the end of this article, you'll have a much better understanding of how these pieces come together and some resources to get started building your own DocBot.
Getting Started
We’ll first want to prepare our data for consumption, but lets take a moment to learn a bit more about how these systems work with the data you give them.
When using ChatGPT, you are not just sending a single message and receiving a single response. Instead, the entire conversation is passed to the model, and only the newly generated text is returned to the user.
This is an important distinction because we must consider how models process and store the information they are given. Most importantly, we must understand that there is a limit on how much information a model can store before reaching the limit of its context.
Before we jump into the concept of context, let’s talk about tokens. Tokens are the building blocks of data that language models use. They can be as small as a single character, as large as a single word, or even larger.
For example, in the sentence "I love ice cream!", there are five tokens: "I", " love", " ice", " cream", and “!”. Notice that the spaces are preserved and that the exclamation mark itself counts as a single token.
We can use an interactive tokenizer to see this in action.
A Little Context
The reason this is important to understand is because language models are trained with a maximum sequence length of tokens. This is commonly referred to as the "context length" or "context window", which determines the model's ability to hold information at runtime.
For example, the standard ChatGPT model has a context length of 4096 tokens, equivalent to around 16 pages of common English text. Once this limit is reached, the model will no longer take input or produce an output.
We've recently started seeing larger context windows in language models. ChatGPT just released a 16k context length version, GPT-4 has 8k and 32k, and Anthropic’s Claude model has a version that can hold a staggering 100k tokens.
Eventually there may not be any context limits at all, but for now, we must keep this in mind as we integrate these systems.
Preparing the Data
The first step is to prepare your data in a format that is suitable for your language model to consume. The quality of the data used can significantly impact the performance of your chatbot. So, the saying "garbage in, garbage out" holds true for AI as well.
For instance, if your data set includes company documentation, you might want to remove any outdated information or irrelevant sections. If there are any inconsistencies in the way information is presented, such as different terms being used to refer to the same concept, you might want to standardize these to avoid confusion.
Another consideration is to convert to formats that use fewer tokens. For example, JSON files contain many extra tokens due to the syntax and formatting, so converting them to YAML files can save a lot of space.
Chunks of Information
Now that we have all of our data lined up and ready to go we need to make it more manageable for our language model.
The goal is to group the information into contextual chunks that contain enough information to give context on their own, but are also small enough for multiple closely related chunks to be fed to the language model as part of the response context.
Approaching this will vary depending on the type of information, but a great library for handling this is Langchain with their various TextSplitter classes.
After you’ve decided on a chunking strategy and your data is split, we can now take the text chunks and create embeddings from them.
Talking Semantics
Embeddings are a way to represent words as vectors, which are lists of numbers in multidimensional space. You can explore an interactive example of this concept here.
By selecting one of the embeddings – single words in this case – you can see the closest related words. These words are given a score that indicates how close they are in this space, which allows the model to understand the relationships between different concepts.
This same logic holds true for larger bodies of text as well. For instance, "the dog chased the frisbee" would live in a similar vector space to "a puppy chased a disc." Although there is not a single shared word between the two phrases, the model understands these concepts are semantically the same.
Now that we have created our embeddings and gained a better understanding of how LLMs process language, we need to find a way to store this information to query later.
Storing the Knowledge
There are several options available for storing vectors, including some locally hosted options, but for ease of use and stability in production we will use Pinecone to store our vectors.
Once our store is set up, we can upload our embeddings created in the previous step and we are ready to make our first semantic search against our data. Our search will return related documents, along with the similarity scores and any provided metadata.
You’ll notice that even if the queries are unrelated to your data, they will still return results. However, we can use the similarity score to filter out low-ranking documents and keep only the best possible context for our model.
We can also attach metadata to our documents, such as the URL or page number, which we can later filter by or return as part of the response.
Talking to AI
With our query and related documents in hand, we can pass the highest scoring results to a language model to generate a response. We will use a prompt that instructs the language model to answer the user's question with the given context.
Here we can also tell the model to respond with something like “I don’t know, please provide more context” if it doesn’t have enough information to properly answer the question. This is an important step, because the models may hallucinate a response if not given instruction on how to handle cases where it doesn’t have the proper information.
Along with the base response we can append any of our metadata so that the user can look up the corresponding information with the search. This helps to further validate the accuracy of the response and provides insights on how the model is answering your questions based on your documents.
Creating a Flywheel
One of the most powerful aspects of integrating AI into your systems is the ability to create a feedback loop. If set up correctly and cared for, this self-perpetuating cycle can continuously improve your chatbot and the experience for your users.
The basic process of the AI and user feedback loop involves collecting user queries and model responses, and then adding a mechanism for users to rate these responses. This feedback can be invaluable for understanding how well your chatbot is performing and identifying areas where improvements can be made.
In addition to improving the chatbot's performance, this feedback loop can help provide insights into your documentation. If certain queries consistently result in poor responses, it may indicate that these areas of your documentation are unclear or lacking in detail. By addressing these issues, you can improve the quality of your knowledge base and ensure it provides the most accurate and helpful information possible.
By establishing this continuous refinement process, you can ensure your chatbot remains up-to-date, relevant, and increasingly efficient. This not only provides an ever-improving user experience but also helps your organization stay ahead in the fast-paced world of AI technology.
Closing Thoughts
In the age of information, the ability to quickly access and understand knowledge is critical. Leveraging AI chatbots like ChatGPT to interact with your knowledge base can significantly improve the user experience, making information more accessible and digestible.
At Knapsack, we're excited about AI's potential to transform how we interact with information. We're continuously exploring new ways to use AI to enhance workflows, accelerate learning, and improve user experiences. We look forward to sharing more of our work with you in the future.
In the meantime, be sure to check out our talk on AI at the Future of Design Systems Conference to learn more about how we see this transformative technology shaping the future.