snexus · snexus · Oct 8, 2024 · Aug 3, 2024 · Aug 3, 2024 · Aug 5, 2024
diff --git a/.env_template b/.env_template
@@ -1 +1,8 @@
-OPENAI_API_KEY=<<<YOUR_API_KEY>>>
+OPENAI_API_KEY=<<<YOUR_API_KEY>>>
+
+# Only if using image parsing using Gemini
+GOOGLE_API_KEY=<<<YOUR_API_KEY>>>
+
+# Only if using table parsing using azure document intelligence
+AZURE_DOCUMENT_INTELLIGENCE_ENDPOINT=<< AZURE_DOCUMENT_INTELLIGENCE_ENDPOINT >>
+AZURE_DOCUMENT_INTELLIGENCE_KEY=<< AZURE_DOCUMENT_INTELLIGENCE_KEY >>
diff --git a/.gitignore b/.gitignore
@@ -5,6 +5,7 @@ __pycache__/
 .env
 dev/*
 output_images/
+azuredoc_temp/
 
 # C extensions
 *.so
@@ -14,6 +15,7 @@ temp_data/
 *.npz
 *.db
 sample_templates/obsidian_conf_test.yaml
+sample_templates/test-templates/*
 .venv2
 
 # Distribution / packaging

diff --git a/README.md b/README.md
@@ -16,6 +16,10 @@ The purpose of this package is to offer a convenient question-answering (RAG) sy
     * Other common formats are supported by `Unstructured` pre-processor:
         * List of formats see [here](https://unstructured-io.github.io/unstructured/core/partition.html).
 
+* Support for table parsing via open-source gmft (https://github.com/conjuncts/gmft) or Azure Document Intelligence.
+
+* Optional support for image parsing using Gemini API.
+
 * Supports multiple collection of documents, and filtering the results by a collection.
 
 * An ability to update the embeddings incrementally, without a need to re-index the entire document base.