Let's say we have 1 million of rows like this:
import sqlite3
db = sqlite3.connect(':memory:')
c = db.cursor()
c.execute('CREATE TABLE mytable (id integer, description text)')
c.execute('INSERT INTO mytable VALUES (1, "Riemann")')
c.execute('INSERT INTO mytable VALUES (2, "All the Carmichael numbers")')
Background:
I know how to do this with Sqlite:
-
Find a row with a single-word query, up to a few spelling mistakes with the
spellfixmodule and Levenshtein distance (I have posted a detailed answer here about how to compile it, how to use it, ...):db.enable_load_extension(True) db.load_extension('./spellfix') c.execute('SELECT * FROM mytable WHERE editdist3(description, "Riehmand") < 300'); print c.fetchall() #Query: 'Riehmand' #Answer: [(1, u'Riemann')]With 1M rows, this would be super slow! As detailed here,
postgresqlmight have an optimization with this usingtrigrams. A fast solution, available with Sqlite, is to use aVIRTUAL TABLE USING spellfix:c.execute('CREATE VIRTUAL TABLE mytable3 USING spellfix1') c.execute('INSERT INTO mytable3(word) VALUES ("Riemann")') c.execute('SELECT * FROM mytable3 WHERE word MATCH "Riehmand"'); print c.fetchall() #Query: 'Riehmand' #Answer: [(u'Riemann', 1, 76, 0, 107, 7)], working! -
Find an expression with a query matching one or multiple words with FTS ("Full Text Search"):
c.execute('CREATE VIRTUAL TABLE mytable2 USING fts4(id integer, description text)') c.execute('INSERT INTO mytable2 VALUES (2, "All the Carmichael numbers")') c.execute('SELECT * FROM mytable2 WHERE description MATCH "NUMBERS carmichael"'); print c.fetchall() #Query: 'NUMBERS carmichael' #Answer: [(2, u'All the Carmichael numbers')]It is case insensitive and you can even use a query with two words in the wrong order, etc.: FTS is quite powerful indeed. But the drawback is that each of the query-keyword must be correctly spelled, i.e. FTS alone doesn't allow spelling mistakes.
Question:
How to do a Full Text Search (FTS) with Sqlite and also allow spelling mistakes? i.e. "FTS + spellfix" together
Example:
- row in the DB:
"All the Carmichael numbers" - query:
"NUMMBER carmickaeel"should match it!
How to do this with Sqlite?
It is probably possible with Sqlite since this page states:
Or, it [spellfix] could be used with FTS4 to do full-text search using potentially misspelled words.
Linked question: String similarity with Python + Sqlite (Levenshtein distance / edit distance)
from Sqlite with real "Full Text Search" and spelling mistakes (FTS+spellfix together)
No comments:
Post a Comment