samedi 18 juin 2016

Using fuzzywuzzy to create a column of matched results in the data frame

I'm running into a challenge with using the FuzzyWuzzy library to store all my results in a data frame column (I'm guessing it might require a loop?) I've been scratching my head over this all day, now I want to see if any of you can help me with the solution! Would be super helpful!


As an example of what I'm trying to do, here's 2 data frame tables…

Master Table

+----+-----------------+
| ID |      ITEM       |
+----+-----------------+
|    |                 |
| 1  | Pepperoni Pizza |
|    |                 |
| 2  | Cheese Pizza    |
|    |                 |
| 3  | Chicken Salad   |
|    |                 |
| 4  | Plain Salad     |
+----+-----------------+

Lookup Table

+--------------+---+
| LOOKUP VALUE | - |
+--------------+---+
|              |   |
| Cheese       | - |
|              |   |
| Salad        | - |
+--------------+---+

Essentially I'm trying to use the lookup table's values against the entire list of values in the Master table, and store the results in a third table.

Here's how I want the final output to look...

+--------------+----------------------------+-------------------+
| LOOKUP VALUE |       MATCHED VALUES       | MATCHED VALUE IDS |
+--------------+----------------------------+-------------------+
|              |                            |                   |
| Cheese       | Cheese Pizza               | 2                 |
|              |                            |                   |
| Salad        | Chicken Salad, Plain Salad | 3,4               |
+--------------+----------------------------+-------------------+

I know the very basics of Fuzzy Wuzzy, here's how I started:

from fuzzywuzzy import fuzz
from fuzzywuzzy import process

choices = ["Pepperoni Pizza","Cheese Pizza","Chicken Salad", "Plain Salad"]
process.extract("salad",choices,limit=2)

Output = [('Chicken Salad', 90), ('Plain Salad', 90)]

Great, but how do you do that in a systematic way, running all my lookup values against all the values in the master table?

Thanks a ton for reading me out!

Aucun commentaire:

Enregistrer un commentaire