X
Xata3mo ago
MissS

Search duplication latency

I recently truncated a table, and noticed that the search usage metrics hasn't been updated. How long is the delay between updating the DB and the elasticsearch update?
1 Reply
kostas
kostas3mo ago
Hi, Search replication lag is typically in the few milliseconds and can rarely spike up to a few seconds under very high write load. For you as a user, an easy way to verify replication has caught up is by checking the number of records in Postgres vs OpenSearch with these queries (can be run in the Playground)
const summarize_count = await xata.db.tablename.summarize({
summaries: { total: { count: "*" } },
});
console.log(summarize_count);

const agg_count = await xata.db.tablename.aggregate({ record_count: { count: "*" } });

console.log(agg_count);
const summarize_count = await xata.db.tablename.summarize({
summaries: { total: { count: "*" } },
});
console.log(summarize_count);

const agg_count = await xata.db.tablename.aggregate({ record_count: { count: "*" } });

console.log(agg_count);
If the result count is equal, it means the number of records in the two different data stores matches. Now the actual reported size of data in Usage metrics can be a different story. Lucene, the storage engine of OpenSearch, won't purge data immediately upon receiving a delete event. It marks the document for deletion until a substantial percentage of the documents in a shard are marked for deletion, in which case it runs a merge that rewrites the segment, cleaning up the deleted documents for real. This process is asynchronous and is "out" of our control, it depends on OpenSearch's internals when it triggers. There is a way to trigger it manually if necessary i.e. if there are billing concerns, in which case you can let us know at support@xata.io . We could also help double check if that is the case with the current size discrepancy you noticed, so let us know there of your workspace id and database:branch to check on.
Want results from more Discord servers?
Add your server