Letting Chatbots See Your Data: Coding framework LlamaIndex enables data interaction with LLMs

A new coding framework lets you pipe your own data into large language models. LlamaIndex streamlines the coding involved in enabling developers to summarize, reason over, and otherwise manipulate data from documents, databases, and apps using models like GPT-4.

Letting Chatbots See Your Data: Coding framework LlamaIndex enables data interaction with LLMs

A new coding framework lets you pipe your own data into large language models.

What’s new: LlamaIndex streamlines the coding involved in enabling developers to summarize, reason over, and otherwise manipulate data from documents, databases, and apps using models like GPT-4.

How it works: LlamaIndex is a free Python library that works with any large language model.

  • Connectors convert various file types into text that a language model can read. Over 100 connectors are available for unstructured files like PDFs, raw text, video, and audio; structured sources like Excel or SQL files; or APIs for apps such as Salesforce or Slack.
  • LlamaIndex divides the resulting text into chunks, embeds each chunk, and stores the embeddings in a database. Then users can call the language model to extract keywords, summarize, or answer questions about their data.
  • Users can prompt the language model using a description such as, “Given our internal wiki, write a one-page onboarding document for new hires.” LlamaIndex embeds the query, retrieves the best-matching embedding from the database, and sends both to the language model. Users receive the language model's response; in this case, a one-page onboarding document.

Behind the news: Former Uber research scientist Jerry Liu began building LlamaIndex (originally GPT Index) in late 2022 and co-founded a company around it earlier this year. The company, which recently received $8.5 million in seed funding, plans to launch an enterprise version later this year.

Why it matters: Developing bespoke apps that use a large language model typically requires building custom programs to parse private databases. LlamaIndex offers a more direct route.

We’re thinking: Large language models are exciting new tools for developing AI applications. Libraries like LlamaIndex and LangChain provide glue code that makes building complex applications much easier — early entries in a growing suite of tools that promise to make LLMs even more useful.