Melusine Regexes
Let's say you are a very busy fairy handling the organisation of a lot of fairy stuff. You might want to apply a special king of magic named "regex" to your emails so that all the fairying goes smoothly.
There are some things you want sent directly to the trash:
- That annoying Voldemort dude keeps sending his
Avada Kedavra
thingy, which tingles everytime you open it, let's delete it on sight. - Gandalf sends all the memes about him and his stupid catchphrase, you don't really care about it and want it deleted as well.
Preparing our custom MelusineRegex
So you heat up your cauldron and start by cooking the following MelusineRegex
from melusine.base import MelusineRegex
class AnnoyingEmailsRegex(MelusineRegex):
@property
def positive(self) -> Union[str, Dict[str, str]]:
return dict(
VOLDY_BEING_VOLDY="Avada Kedavra",
GANDALF_BEING_GANDALF="You shall not pass",
)
@property
def match_list(self) -> List[str]:
return [
"Avada Kedavra is a spell used by Lord Voldemort",
"And then, you know me, I was not gonna let it pass so I told them : You shall not pass and obviously everyone clapped",
]
@property
def no_match_list(self) -> List[str]:
return ["Abracadabra, here I am", "I told them not to pass"]
Let's use it on this incoming email.
to_delete_regex = AnnoyingEmailsRegex()
to_delete_detection = to_delete_regex(
"You shall not pass through the magical portal of the lake from Monday to Thursday as it is currently under repair."
)
print(to_delete_detection[MelusineRegex.MATCH_RESULT])
print(to_delete_detection[MelusineRegex.POSITIVE_MATCH_FIELD])
The thing is, our email actually came from the Magical Portal Society and was pretty important. We should exclude any email mentionning the magical portal from detection to make sure we still get the important infos.
Using both negative and positive matches
from melusine.base import MelusineRegex
class AnnoyingEmailsRegex(MelusineRegex):
@property
def positive(self) -> Union[str, Dict[str, str]]:
return dict(
VOLDY_BEING_VOLDY="Avada Kedavra",
GANDALF_BEING_GANDALF="You shall not pass",
)
@property
def negative(self) -> Union[str, Dict[str, str]]:
return dict(
PORTAL_MENTIONNED="magical portal of the lake",
)
@property # (1)!
def match_list(self) -> List[str]:
return [
"Avada Kedavra is a spell used by Lord Voldemort",
"And then, you know me, I was not gonna let it pass so I told them : You shall not pass and obviously everyone clapped",
]
@property # (2)!
def no_match_list(self) -> List[str]:
return ["Abracadabra, here I am", "I told them not to pass"]
And now the Regex works as follows
to_delete_regex = AnnoyingEmailsRegex()
email = "You shall not pass through the magical portal of the lake from Monday to Thursday as it is currently under repair."
to_delete_detection = to_delete_regex(email)
print(to_delete_detection[MelusineRegex.MATCH_RESULT])
Preprocessing
Now, you know that old geezer of Gandalf, he tends to mispell words often.
So you want to make sure you detect any of his hobbit-feast-greasy-fingers-written emails too.
With a pinch of preprocessing, it's easy to do!
class PreMatchHookAnnoyingEmailsRegex(AnnoyingEmailsRegex):
def pre_match_hook(self, text: str) -> str:
text = text.replace("sholl not pass", "shall not pass")
return text
preprocessed_to_delete_regex = PreMatchHookAnnoyingEmailsRegex()
spell_result = preprocessed_to_delete_regex.get_match_result(
"Andthen,, I told Morgana 'You sholl not pass!' as she wanted topass... Im stil wonddering why she did not find it fundny..."
)
Using the regex result
As you can see, the negative cancels the positive, hence this email was not detected.
The complete to_delete_detection
object looks like this:
{
"match_result":False,
"neutral_match_data":{
},
"negative_match_data":{
"PORTAL_MENTIONNED":[
{
"start":31,
"stop":57,
"match_text":"magical portal of the lake"
}
]
},
"positive_match_data":{
"GANDALF_BEING_GANDALF":[
{
"start":0,
"stop":18,
"match_text":"You shall not pass"
}
]
}
}
For a more straightforward approach, if you only need the regex result, you can use the following syntax:
1. ReturnsTrue
Some of the older fairies might need a more detailed explanation on what triggered the deletion of an email.
Fortunately the describe methode serves this exact purpose.
Which will print:
The MelusineRegex match result is : NEGATIVE
The following text matched negatively: (PORTAL_MENTIONNED) magical portal of the lake
The following text matched positively: (GANDALF_BEING_GANDALF) You shall not pass
Examples list
Fairies social life can be hectic resulting in a variety of emails from all kinds of creatures.
Like most magical artefacts, regexes can be quite obscure and hard to decipher.
This is where the match_list
and no_match_list
properties come in handy:
- Examples in the
match_list
should activate the MelusineRegex - Examples in the
no_match_list
should not activate the MelusineRegex
The test
method will be run at instanciation to check if the regex is working as intended.
from melusine.base import MelusineRegex
class AnnoyingEmailsRegex(MelusineRegex):
@property
def positive(self) -> Union[str, Dict[str, str]]:
return dict(
VOLDY_BEING_VOLDY="Avada Kedavra",
GANDALF_BEING_GANDALF="You shall not pass",
)
@property
def match_list(self) -> List[str]:
return [
"Avada Kedavra is a spell used by Lord Voldemort",
"Erroneous example: This will not match!",
]
@property
def no_match_list(self) -> List[str]:
return ["Abracadabra, here I am", "I told them not to pass"]
regex = AnnoyingEmailsRegex()
Even more advanced use case with "neutral"
As a fairy preserving the balance of the world and all, another case you would like to handle is differentiating your colleague Ifrit's emails between dangerous or not. Contrarly to him you actually like the not-yet-totally-burning state of the world and would rather keep it that way.
But you cannot afford to go all-in everytime he jokingly sends false alarms emails.
The good thing is that Ifrit is a bit of a dummy:
- whevener he wants to burn the world for real and needs to be stopped he sends you an email with his intentions.
- if he is actually joking, the emails uses contractions which make his intentions super easy to guess (he is not just "a bit" of a dummy, he can be plain stupid sometimes).
That is were neutral regex can be of use. Whenever a neutral regex is matched, it is neutralized: all the match content is "blurred" and won't match anything later.
class IfritAlertRegex(MelusineRegex):
@property
def positive(self) -> Union[str, Dict[str, str]]:
return dict(
WORLD_MIGHT_BURN_1=r"see (the world|everything) (burn|in flames)",
WORLD_MIGHT_BURN_2=r"make (the world|everything) (burn|in flames)",
)
@property
def neutral(self) -> Union[str, Dict[str, str]]:
return dict(
WORLD_WONT_BURN_1=r"I wanna see (the world|everything) (burn|in flames)",
WORLD_WONT_BURN_2=r"Imma make (the world|everything) (burn|in flames)",
WORLD_WONT_BURN_3=r"I wanna make (the world|everything) (burn|in flames)",
)
@property
def match_list(self) -> List[str]:
return [
"I want to see the world burn",
"Let us make the world burn, shall we",
"I wanna make the world burn and see everything in flames",
]
@property
def no_match_list(self) -> List[str]:
return ["I wanna see the world burn", "Imma make everything in flames"]
Conclusion
- The
MelusineRegex
class is a convenient tool to keep regexes clean, documented and easy to use. - Advanced features like "pre" and "post" match hooks bring flexibility to accommodate exotic use cases.
- The
match_list
andno_match_list
help onboard newcomers on what the regex does. - The
test
method is a great way to ensure the regex is working as intended.
Now you can go back to your fairy duties and let the regex do the heavy lifting for you.