From 191aac85a7ee177a1852af406e3cc6194aecfd3f Mon Sep 17 00:00:00 2001 From: Alexander Zierhut Date: Sun, 26 Mar 2023 01:54:42 +0100 Subject: [PATCH 1/2] Add another data syncing strategy (Query Parsing) --- .../content/guide/syncing-data-into-typesense.md | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/docs-site/content/guide/syncing-data-into-typesense.md b/docs-site/content/guide/syncing-data-into-typesense.md index 3a8cea55..7f307ad8 100644 --- a/docs-site/content/guide/syncing-data-into-typesense.md +++ b/docs-site/content/guide/syncing-data-into-typesense.md @@ -35,6 +35,15 @@ If you use an ORM, you can hook into callbacks provided by your ORM framework: 1. In your ORM's `on_save` callback (might be called something else in your particular ORM), write the changes that need to be synced into Typesense into a temporary queue 2. Every say 5s, have a scheduled job that reads all the changes from this queue and bulk imports them into Typesense. +### Query Parsing + +If you use queries to interact with your database, you likely have a central function through which all of your queries are passed to your database. By using a query parser, you are able to determine, which action is taken as well as which table and fields are affected. + +1. Parse the query using a well-tested and established library for your specific language. +2. Determine the action being taken. Mostly INSERT, UPDATE or DELETE statements. +3. If needed, determine the fields affected. +4. Dispatch an event with the gathered information. A listener can then subscribe to relevant events and replicate those changes to Typesense. + ### Using Airbyte [Airbyte](https://airbyte.com/why-airbyte) is an open source platform that lets you sync data between different sources and destinations, with just a few clicks. @@ -46,10 +55,10 @@ Read more about how to deploy Airbyte, and set it up [here](https://airbytehq.gi ## Sync real-time changes In addition to the above, if you have a use case where you want to update some records in realtime, may be because you want a user's edit to a record to be immediately reflected in the search results (and not after say 10s or whatever your sync interval is in the above process), -you can also use the Single Document Indexing API. +you can also use the Single Document Indexing API each time a record change event happens. You may want to buffer these events in a queue for situations where real-time synchronization can not be achieved due to i.e. server load. Note however that the bulk import endpoint is much more performant and uses less CPU capacity, than the single document indexing endpoint for the same number of documents. -So you want to try and use the bulk import endpoint as much as possible, even if that means reducing your sync interval for the process above to as less as say 2s. +So you want to try and use the bulk import endpoint as much as possible, even if that means reducing your sync interval for the process above to as less as say 2s. When using the afformentioned buffering strategy, your consumer may simply wait for a maximum of 2s in that case to gather events before importing. ## Full re-indexing @@ -90,4 +99,4 @@ You could also change the value of `healthy-read-lag` and `healthy-write-lag` parameter called `batch_size`. This controls server-side batching (how many documents from the import API call are processed, before the search queue is serviced), and you almost never want to change this value from the default. -Instead, you want to do client-side batching, by controlling the number of documents in a single import API call and potentially do multiple API calls in parallel. \ No newline at end of file +Instead, you want to do client-side batching, by controlling the number of documents in a single import API call and potentially do multiple API calls in parallel. From a3777afae719f4bea3a7b74a551d2c8736a3144f Mon Sep 17 00:00:00 2001 From: Alexander Zierhut Date: Sun, 26 Mar 2023 03:16:29 +0200 Subject: [PATCH 2/2] Add advice on restoring state --- docs-site/content/guide/syncing-data-into-typesense.md | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/docs-site/content/guide/syncing-data-into-typesense.md b/docs-site/content/guide/syncing-data-into-typesense.md index 7f307ad8..34c26f7a 100644 --- a/docs-site/content/guide/syncing-data-into-typesense.md +++ b/docs-site/content/guide/syncing-data-into-typesense.md @@ -100,3 +100,8 @@ In the import API call, you'll notice a