Cloudflare Developers•2w ago

Hi, is there a way of accessing each

Hi, is there a way of accessing each vecor to add metadata tags to each vector? Used standard R2 ingest but did not add any meta-data and cannot seem to recover a vector based upon id. ID appears to be random value (hash?) I see that each chunk returns the same basic format:
Chunk 4 Cosine Sim. 0.6107 Relevancy 0.0017 Metadata 450db445-638b-4bb9-a0e8-16af21562e23/291385e0-f9c0-4126-96b4-6b1e0c1dfab1/1997-011.pdf Chunk 1 Cosine Sim. 0.6169 Relevancy 0.0006 Metadata 450db445-638b-4bb9-a0e8-16af21562e23/291385e0-f9c0-4126-96b4-6b1e0c1dfab1/1997-008.pdf Chunk 3 Cosine Sim. 0.6140 Relevancy 0.0019 Metadata 450db445-638b-4bb9-a0e8-16af21562e23/291385e0-f9c0-4126-96b4-6b1e0c1dfab1/1997-012.pdf So the metadata appears to be the same apart from the file name and a chunk number. Is there a way of determining these values? I assume the first one is "processeduptoMutation ID" but not sure what the second value might be or perhaps the other way around. Need to understand how to recover this to be able to recover a specific set of vector values for a chunk.

3 Replies

garvitg•2w ago

Hi @Stephen, could you please share some context about how you're using Vectorize? Are you ingesting data into Vectorize using Autorag?

StephenOP•2w ago

Hi, Yes, ingesting a series of .pdf files using autorag with vectorize using fast embedding option. This is fully automated, so did not see any way of creating a UUID for each chunk as metadata. Yet, the vectorize database is clearly creating its own UUID for the filename and chunk value. All I need to do is access each vector chunk.

garvitg•2w ago

I think the #autorag channel would be able to help you better with some of these specifics and especially your queries around the file chunks.

Gaming

Programming

Hi, is there a way of accessing each

Did you find this page helpful?