Replies: 2 comments
-
|
We've changed our architecture so all documents are converted to text initally through different Loaders (that handle PDF, CSV conversion and etc.) in the add pipeline, so only TextDocuments are used now in Cognify. Before we've had this conversion happen during Cognify with these different Document classes |
Beta Was this translation helpful? Give feedback.
0 replies
-
|
Thank you for the clarification. That makes perfect sense. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
=I am looking at the data flow, specifically: Ingestion -> ... -> Classify documents -> ... -> Extract chunks -> ...
My understanding is that during the Ingestion step, various file types (such as PDF, audio, and image files) are saved into Cognee’s storage and then converted to text.
My confusion arises at the Classify documents step. It appears that this step classifies the converted text as a TextDocument. This has led me to wonder if the specific classes—such as PdfDocument, AudioDocument, and ImageDocument—are being utilized in this part of the flow.
I would like to know the intended role of these specific document classes. Are they used in a different part of the process, or am I perhaps misunderstanding a key aspect of their functionality that I haven't seen?
This discussion was automatically pulled from Discord.
Beta Was this translation helpful? Give feedback.
All reactions