Tuesday, 1 September 2020

Parsing a boolean expression into a MySQL query in PHP - part 2

I am coding an app for a friend, and SQL is not my strong suit. I thought that I had laid this matter to rest with my previous question, which received an excellent answer.

However, my friend has moved the goal posts yet again (and swears that it is final this time).

Given these tables

mysql> describe skill_names;
+------------+----------+------+-----+---------+----------------+
| Field      | Type     | Null | Key | Default | Extra          |
+------------+----------+------+-----+---------+----------------+
| skill_id   | int(11)  | NO   | PRI | NULL    | auto_increment |
| skill_name | char(32) | NO   | MUL | NULL    |                |
+------------+----------+------+-----+---------+----------------+

mysql> describe skill_usage;
+----------+---------+------+-----+---------+-------+
| Field    | Type    | Null | Key | Default | Extra |
+----------+---------+------+-----+---------+-------+
| skill_id | int(11) | NO   | MUL | NULL    |       |
| job_id   | int(11) | NO   | MUL | NULL    |       |
+----------+---------+------+-----+---------+-------+

mysql> describe jobs;
+--------------+---------+------+-----+---------+----------------+
| Field        | Type    | Null | Key | Default | Extra          |
+--------------+---------+------+-----+---------+----------------+
| job_id       | int(11) | NO   | PRI | NULL    | auto_increment |
| candidate_id | int(11) | NO   | MUL | NULL    |                |
| company_id   | int(11) | NO   | MUL | NULL    |                |
| start_date   | date    | NO   | MUL | NULL    |                |
| end_date     | date    | NO   | MUL | NULL    |                |
+--------------+---------+------+-----+---------+----------------+
5 rows in set (0.00 sec)

The user inputs a text for skill search.

Some examples might be

  • C
  • C or C++
  • C++ AND UML
  • (C AND kernel) OR (C++ AND UML)

And I want to parse that in PHP and generate an appropriate SQL query. Something along the lines of the answer to my previous question, but it will need to be tweaked.

Previously, I had believed that if the user input two skills and ANDed them e.g C++ and UML, then I should return details of any job where BOTH skills were used.

Now, he says that he just wants candidates who have used both of those skills (hence the AND), but not necessarily on the same job. To clarify, my PHP API will return an array of candidates, each entry of which will have an array of jobs.

I want to tweak the code in the answer to my previous question to achieve this. I wondered if I would probably end up doing the whole thing in PHP, with a bunch of nested FOR loops, rather than offloading it onto the SQL engine.

Update

I can't find a good fiddle site that allows both PHP and MySQL.

If anyone wants to test their own code on localhost, the main parts of the test database look like this:

Test search 3.1.5 (Python AND UML) OR (C++ OR UML)

Candidate name  Company Job year    Skills     Overall match
One             Thales      2015    C           No
One             BAe         2016    Python      No
One             Google      2017    C++         No
Two             BAe         2015    C++         Yes
Two             Google      2020    Python      Yes
Two             Google      2011    C++, UML    Yes
Three           Thales      2019    Python, UML Yes

and here is an SQL script to create the test database:

CREATE TABLE `candidates` (
  `candidate_id` INT(11) NOT NULL AUTO_INCREMENT,
  `candidate_name` CHAR(50),
  `candidate_city` CHAR(50),
  `latitude` DECIMAL(11,8),
  `longitude` DECIMAL(11,8),
     PRIMARY KEY (candidate_id));

CREATE TABLE `companies` (
  `company_id` INT(11) NOT NULL AUTO_INCREMENT,
  `company_name` CHAR(50) NOT NULL,
  `company_city` CHAR(50) NOT NULL,
  `company_post_code` CHAR(50) NOT NULL,
  `latitude` DECIMAL(11,8) NOT NULL,
  `longitude` DECIMAL(11,8) NOT NULL,
     PRIMARY KEY (company_id));

CREATE TABLE `jobs` (
  `job_id` INT(11) NOT NULL AUTO_INCREMENT,
  `candidate_id` INT(11) NOT NULL,
  `company_id` INT(11) NOT NULL,
  `start_date` DATE NOT NULL,
  `end_date` DATE NOT NULL,
     PRIMARY KEY (job_id));


CREATE TABLE `skill_names` (
  `skill_id` INT(11) NOT NULL AUTO_INCREMENT,
  `skill_name` CHAR(32) NOT NULL,
     PRIMARY KEY (skill_id));


CREATE TABLE `skill_usage` (
  `skill_id` INT(11) NOT NULL,
  `job_id` INT(11) NOT NULL);

INSERT INTO `skill_names` (skill_name) VALUES("C");
INSERT INTO `skill_names` (skill_name) VALUES("Python");
INSERT INTO `skill_names` (skill_name) VALUES("C++");
INSERT INTO `skill_names` (skill_name) VALUES("UML");

INSERT INTO `candidates` (candidate_name, candidate_city, latitude, longitude ) VALUES("One",   "Hastings",   50.8543, 0.5735);
INSERT INTO `candidates` (candidate_name, candidate_city, latitude, longitude ) VALUES("Two",   "Slough",  51.5105, 0.5950);
INSERT INTO `candidates` (candidate_name, candidate_city, latitude, longitude ) VALUES("Three", "Stonehenge", 51.1789, -1.8262);

INSERT INTO `companies` (company_name, company_city, company_post_code, latitude, longitude ) VALUES("Thales", "Crawley",   "AB1 1CD", 51.1091, -0.1872);
INSERT INTO `companies` (company_name, company_city, company_post_code, latitude, longitude ) VALUES("BAe",    "Rochester", "EF1 2GH", 51.3880, 0.5067);
INSERT INTO `companies` (company_name, company_city, company_post_code, latitude, longitude ) VALUES("Google", "East Ham",  "E6  0XX", 51.5334, 0.0499);

INSERT INTO `jobs` (candidate_id, company_id, start_date, end_date ) VALUES(1, 1, "2015-01-010", "2015-12-31");
INSERT INTO `jobs` (candidate_id, company_id, start_date, end_date ) VALUES(1, 2, "2016-01-010", "2016-12-31");
INSERT INTO `jobs` (candidate_id, company_id, start_date, end_date ) VALUES(1, 3, "2017-01-010", "2017-12-31");
INSERT INTO `jobs` (candidate_id, company_id, start_date, end_date ) VALUES(2, 2, "2015-01-010", "2015-12-31");
INSERT INTO `jobs` (candidate_id, company_id, start_date, end_date ) VALUES(2, 3, "2020-01-010", "2020-12-31");
INSERT INTO `jobs` (candidate_id, company_id, start_date, end_date ) VALUES(2, 3, "2011-01-010", "2011-12-31");
INSERT INTO `jobs` (candidate_id, company_id, start_date, end_date ) VALUES(3, 1, "2019-01-010", "2019-12-31");

INSERT INTO `skill_usage` (job_id, skill_id) VALUES(1, 1);
INSERT INTO `skill_usage` (job_id, skill_id) VALUES(2, 2);
INSERT INTO `skill_usage` (job_id, skill_id) VALUES(3, 3);
INSERT INTO `skill_usage` (job_id, skill_id) VALUES(4, 3);
INSERT INTO `skill_usage` (job_id, skill_id) VALUES(5, 2);
INSERT INTO `skill_usage` (job_id, skill_id) VALUES(6, 3);
INSERT INTO `skill_usage` (job_id, skill_id) VALUES(6, 4);
INSERT INTO `skill_usage` (job_id, skill_id) VALUES(7, 2);
INSERT INTO `skill_usage` (job_id, skill_id) VALUES (7, 4);


from Parsing a boolean expression into a MySQL query in PHP - part 2

No comments:

Post a Comment