Handling document updates
- In RAG we build indexing pipeline from document sources and in most of the cases documents get updated.
- We would look into how to handle document updates i.e. updating vector databases with latest documents.
- Strategies:
- Delete and reindex everything
- Update and reindex only what has changed
Sample: Lets generate some documents with data and handle updates
- Prompt for synthetic data
Generate a kb article style text documents for IT helpdesk.
All the documents should be in .txt format
Generate atleast 50 documents with readable names and give me a zip file
- Approach 1 : Delete and reindex
-
Approach 2: Update only what has changed.
-
To handle approach 2, we need to create manifest of document sources

- Langchain gives a RecordManager class which does exactly the same
