Search
suggestion V2
[
docs.google.com
https://docs.google.com/document/d/1luQG11sfsaunQ1nS7D_3QP6mvhakspzH/edit?usp=drivesdk&ouid=104689702589426582454&rtpof=true&sd=true
](https://docs.google.com/document/d/1luQG11sfsaunQ1nS7D_3QP6mvhakspzH/edit?usp=drivesdk&ouid=104689702589426582454&rtpof=true&sd=true)
Objective
This document is created to share the knowledge regarding the how to search suggestion v2 works including its algorithm and its details.
Requirements
The API frame work using FastAPI, for search engine we use whoosh.
- Python 3.9
- Script and other requirements are stored here *https://github.com/hungryhub-team/hh_fastapi*
Algorithm
Indexing
The Indexing algorithm is built by a series of processes. The processes describe in following picture :
First stage – Redshift
In redshift we set scheduler that run every hour to capture any updates from data. There are three table are generated to store those data. Each table has it own purpose, we use search_dataset & search_dataset_misc) to full indexing restaurant and locations/cuisines, respectively. On the other hand, search_dataset_update only provide a hourly updates it suitable for partial indexing. However, this table only update restaurant index and leave the location/cuisines unchanged. The queries to create this table stored in here https://github.com/hungryhub-team/hh_fastapi/tree/main/queries
Second stage – Whoosh indexing
After we built the dataset, another step is indexing in python. Initial Indexing can be done by run this script [https://github.com/hungryhub-team/hh_fastapi/blob/main/indexing.py](<https://github.com/hungryhub-team/hh_fastapi/blob/main/indexing.py>). Moreover, this we can also run indexing by requesting the API using post method to http://{{api_url}/reindex_search_suggestion?all=true, as well as http://{api_url}/reindex_search_suggestion?all=false for partial indexing.
Searching
Algorithm The step of the algorithm is quite complex, we will explain it in summary as follow:
- Pre-loaded As the Fastapi apps starting up we load apps, files and code such as:
- Whoosh index
- Whoosh query parser definition and fuzzy parameter
- Fastapi-cache with aioredis
- Asyncpg with database engine
- Searching process In the nutshell the search suggestion was a combination from several search process. The summary of the processes are explained as follow
- Cleaning the keyword from whitespace and punctuation.
- Read synonym from https://tinyurl/hh_synonym.
- Read from whoosh index.
- If whoosh index returned empty result, the next step is we query directly to the DB.
- If the keyword is not found in index and DB we search the most similar result and return it in the did_you_mean section Please be noted fuzzy and did_you_mean only working good in English keywords. The full picture of the algorithm are show in Figure 2. Searching algorithm.
