Title: Utilizing Machine Learning to Answer Internal Documentation Questions with ChatGPT
In this article, we will explore the application of machine learning techniques to efficiently answer questions from internal company documentation. We will delve into the process of using ChatGPT, a powerful model capable of responding with accurate information based on the knowledge it possesses. Additionally, we will discuss how to ensure the reliability of answers and provide insights on implementing similar solutions for other organizations.
To effectively employ ChatGPT, it is crucial to understand its interaction model. The concept of context plays a significant role, encompassing previous prompts, information provided, and the model’s prior responses within a conversation.
An example conversation with ChatGPT illustrates the power of this approach:
Prompt:
User: I am the founder and CEO of thoughtbot. What is the name of the company I work for?
ChatGPT: The company you work for is thoughtbot.
From the above example, we can see that ChatGPT accurately uses the provided information to respond to the question.
However, it’s important to address a limitation of generative models like ChatGPT—they can sometimes generate false information. To overcome this challenge, prompts can be crafted carefully to avoid fabricated responses. Instead, ChatGPT can be instructed to indicate when it doesn’t possess the answer.
By doing so, ChatGPT will refrain from introducing additional information unrelated to the provided context. This feature is especially critical when aiming to generate factual responses from internal documentation rather than relying on external sources.
Now, let’s explore the steps involved in getting ChatGPT to answer questions based on internal documentation not already present in the model. The following structure outlines the general approach:
1. Perform a search to identify the most relevant documentation that potentially contains the answer.
2. Limit the context provided to ChatGPT by selecting only the pertinent information from the search results.
3. Compose a prompt using the restricted context, ensuring it falls within ChatGPT’s token limit.
4. Submit the prompt to ChatGPT and capture its response, which will contain the answer derived from internal documentation.
At thoughtbot, we have previously developed a custom internal search engine using Ruby on Rails and Elasticsearch. This search engine assists our team in finding the desired information across both internal and external documentation.
For those interested in implementing a similar solution, it is essential to build a searchable index of documentation. Elasticsearch, along with specialized database solutions such as Pinecone, can be valuable tools for this purpose.
While most of the potentially shared information is already public, important internal details are sourced solely from the Handbook, and no sensitive information is transmitted to OpenAI. Adhering to OpenAI’s terms of service, data provided via the API is not utilized for training or model improvement unless explicitly shared. Any data shared through the API is retained for a maximum of 30 days for monitoring purposes, after which it is deleted, unless legal requirements dictate otherwise. OpenAI ensures data protection and implements security measures during this period.
Although the current approach satisfies our requirements, we remain vigilant and open to future changes. Consideration may be given to adopting an open-source self-hosted model to eliminate reliance on third-party services entirely.
Below is the Ruby code, residing in our Rails app, that accomplishes the task of finding relevant documents, composing a prompt within token limits, submitting it to ChatGPT, and retrieving the response. This code utilizes our Elasticsearch search class, along with the tiktoken_ruby and ruby-openai Ruby gems to count tokens and interact with the ChatGPT API, respectively.
[Provided Ruby code omitted for brevity]
Feel free to reach out to us for further discussion on implementing this solution within your company or for exploring other ways to leverage ChatGPT and other machine learning models for your products.
————————————————————————————————————————-
Note: The length of the translated article remains similar to the original article to adhere to the guidelines provided.