matchFuzzy
Syntax
matchFuzzy(textCol, term, minimumSimilarity, prefixLength,
[scoreColName])
Arguments
textColThe column to be searched, i.e., the column with text indexing set in the PKEY engine.
term A STRING scalar specifying the term(s) to search for. Only support single word searches.
minimumSimilarity A DOUBLE scalar representing the minimum similarity required for a search result, with the value range of [0,1].
prefixLength A non-negative integer indicating that the prefix length of the search result must be the same as that of term.
scoreColName (optional) A STRING scalar representing the name of the text search score column in the output. The default value is null, in which case the score column is not output. The search score represents the degree of match within the partition, and scores from different partitions are not comparable.
Details
This function is used in the where
clause of a SQL statement to
perform fuzzy searches of word-based text on the column with text indexing set in
the PKEY engine, so that the search results remain highly relevant even if the
term contains spelling errors or words in the output text do not match it
exactly.
- When minimumSimilarity is set to 1, words in the output text match the
searched term exactly, which is equivalent to the
matchAny
function. - Only support single word searches. Return null values when search for multiple terms.
- When the prefixLength is larger than the term's length, it is automatically adjusted to the the term’s length.
Examples
// Generate data for queries
stringColumn = ["There are some apples and oranges.","Mike likes apples.","Alice likes oranges.","Mike gives Alice an apple.","Alice gives Mike an orange.","John likes peaches, so he does not give them to anyone.","Mike, can you give me some apples?","Alice, can you give me some oranges?","Alice made apple pie."]
t = table([1,1,1,2,2,2,3,3,3] as id1, [1,2,3,1,2,3,1,2,3] as id2, stringColumn as remark)
if(existsDatabase("dfs://textDB")) dropDatabase("dfs://textDB")
db = database(directory="dfs://textDB", partitionType=VALUE, partitionScheme=[1,2,3], engine="PKEY")
pt = createPartitionedTable(dbHandle=db, table=t, tableName="pt", partitionColumns="id1",primaryKey=`id1`id2,indexes={"remark":"textindex(parser=english, full=false, lowercase=true, stem=true)"})
pt.tableInsert(t)
// Fuzzy search for make; The prefix must be m
select * from pt where matchFuzzy(textCol=remark,term="make",minimumSimilarity=0.6,prefixLength=1)
id1 | id2 | remark |
---|---|---|
1 | 2 | Mike likes apples. |
2 | 1 | Mike gives Alice an apple. |
2 | 2 | Alice gives Mike an orange. |
3 | 1 | Mike, can you give me some apples? |
3 | 3 | Alice made apple pie. |
// Fuzzy search for make; The prefix must be m; Output the score column name as score
select * from pt where matchFuzzy(textCol=remark,term="make",minimumSimilarity=0.6,prefixLength=2,scoreColName="score")
id1 | id2 | remark | score |
---|---|---|---|
3 | 3 | Alice made apple pie. | 0.7027325630187988 |