Tokens are the smallest units of meaning in a language and are used by language models to understand the structure and meaning of a text. In Natural Language Processing (NLP), tokens are typically created by segmenting a sentence or a document into individual words or other meaningful units, such as phrases or named entities. This tokenization process allows Large Language Models (LLMs) to process and analyze large volumes of text data, making it possible to perform tasks such as language translation, text classification, sentiment analysis, and more.
Generally, you won’t tokenize on your own, but you should be aware that there are always many ways to tokenize a single sentence! Suppose you want to tokenize “Let’s tokenize! Isn’t this easy?”
The token limit is the maximum number of tokens that can be used in the prompt and the completion of the model. Most LLMs have token limits, which refer to the maximum number of tokens that the model can process at once. The token limit is determined by the architecture of the model. Have you heard of GPT4 or GPT3.5? Each model has a different token limit, even though they all originate from OpenAI.
You can take a look here for token limits based on each model: https://platform.openai.com/docs/models/
As you can see, GPT4 has a higher token limit than GPT3.
As you can imagine if you have a long document or a lot of text to summarize or to Q/A on top of, this can pose a problem. We have two mitigation strategies to help.
Let’s say you want to summarize all Slack conversations that occurred in the last week. If you have a chatty team, it’s definitely over the token limit. You can overcome this by splitting up, or chunking, the conversation and summarizing each chunk. You can even create a summary on top of them all.
Tips:
To summarize and ask questions for a large corpus of text, another strategy is creating and then querying a knowledge base built on your data. You can store each data point (this can be a conversation, a paragraph in a book, etc.) as an embedding. In simple terms, an embedding is a representation of text strings, that can help us measure relatedness. You can create these embeddings using openAI APIs.
Learn more about embeddings here: https://platform.openai.com/docs/guides/embeddings/what-are-embeddings
You can store these embeddings in a vector database like Pinecone, ElasticSearch, etc. Now based on a question, you can retrieve the top N similar embeddings based on similarity score like cosine and use that as context to ask your question.
The prompt can be as simple as :
Tips:
We’ve learned these techniques while building QueryPal, a Slackbot that can help with all your DevOps needs. An ever-present issue in DevOps is ensuring that the on-call engineer has all the necessary context to solve their problem- a case tailor-made for building a knowledge base.
We hope you find these tips helpful. If you’re interested in learning more about how you can build products using LLMs follow us on Twitter or Linkedin, or just reach out to us at llm@querypal.com.