-
-
Save BrettBukowski/5347463 to your computer and use it in GitHub Desktop.
-- Mail.app can't run a "Contains" filter with a regex, | |
-- so you can't filter on HTML content. Until now. | |
using terms from application "Mail" | |
on perform mail action with messages theMessages for rule theRule | |
try | |
repeat with theMessage in theMessages | |
-- Getting the content as string converts all HTML tags to '?' and just leaves text content | |
set theBody to quoted form of (theMessage's content as string) | |
-- It's awkwardly hard to get sed to work w/ mult. lines, so collapse newlines | |
set theCommandString to "echo " & theBody & " | tr '\\n' ' ' | sed \"s/brett Author/*MATCHED*(&)/\"" as string | |
set sedResult to do shell script theCommandString | |
if sedResult contains "*MATCHED*" then | |
tell application "Mail" | |
move the theMessage to mailbox "My special mailbox" | |
end tell | |
end if | |
end repeat | |
on error errorText number errorNumber | |
tell application "Mail" to display alert ("Error: " & errorNumber) message errorText | |
return | |
end try | |
end perform mail action with messages | |
end using terms from |
I managed to update and re-write Brett Bukowski's original AppleScript above to get it working on Mac OS High Sierra. Brett's original script no longer works on newer Mac OS, because I believe Mail no longer assigns message IDs to incoming messages that have not yet been saved in a mailbox, and without an ID, you cannot reference and access an email message from within AppleScript.
My updated AppleScript below gets around this issue by allowing incoming email messages to be first saved in the INBOX, when the messages will be assigned an ID. Once saved in the INBOX, my script then checks too see if these messages are spam. To check for spam, my script searches the raw source of the email (which includes the email HTML code) for strings that are specified by the user (you) in the AppleScript (user specified by setting searchStringOne, searchStringTwo, etc within the script). This string search can be done by a standard string match search of the email's raw source, or by a regex match search on the raw source (uncomment the regex code in my AppleScript below if you want to use a regex).
To get my AppleScript to work in Apple Mail, as with Brett's script, save my AppleScript code given below in ~/Library/Application Scripts/com.apple.mail/ with a filename of say "Regex Spam Filter.scpt". Then in Apple Mail > Preferences > Rules, add a new rule which is applied for "Every Message", with this rule running the AppleScript "Regex Spam Filter.scpt" (instead of using "Every Message", you can also set the rule to run just on the email accounts that you want to check for spam using my AppleScript). It does not matter where this rule is placed in Mail's list of rules, because when a new email message comes in, the AppleScript Rule will always run after all the other rules are executed, even if it is placed as the first rule.
Then set up a new mailbox in Apple Mail called "Regex Trash". This can be done in the menu: Mailbox > New Mailbox... This mailbox is where the spam email messages found by this AppleScript will be moved to.
In my AppleScript below you will need to change: [email protected], [email protected] to your own email accounts that you want to check for spam, and you will need to change: searchStringOne, searchStringTwo, etc to text strings which will identify the spam emails.
Very often in spam emails, you may find some distinctive HTML code which you can use to identify the spam. For example, sometimes the URLs spammers use have a distinctive feature or string of text that you can use to identify the email as spam. Because this AppleScript checks the email raw source, you can target any HTML code in the email. This is something you cannot do with regular Apple Mail rules, because these only check the email message text content, and do not allow you to search the message raw source.
Here is my AppleScript code:
-- STRING MATCH SEARCH / REGEX MATCH SEARCH SPAM FILTER FOR APPLE MAIL
-- This reads the message at the top of both the INBOX and Trash mailboxes of the specified accounts, and searches for a string match or a regex match in the raw source of those messages. If a match is found in a message, this script moves that message to mailbox called "Regex Trash" (or optionally if you uncomment the appropriate line of code, you can just delete the spam message instantly, without moving it to the Trash).
-- Note that when set up as a Mail rule, this AppleScript seems to be executed only after all the other rules are executed (even if this AppleScript is set up as the first rule in the list). This is why in this AppleScript we need to check both the INBOX and Trash, in case any other rules have moved the incoming message to the Trash.
try
tell application "Mail"
set accountList to {"[email protected]", "[email protected]"} -- Email accounts to perform spam search on
set mailBoxList to {"INBOX", "Trash"} -- "INBOX" must be in capitals; "Trash" must be in lowercase, except for the capital T
repeat with theAccount in accountList
repeat with theMailBox in mailBoxList -- We look in both the INBOX and Trash mailboxes of the email account
set N to 0
repeat with theMessageID in (every message) of (mailbox theMailBox of account theAccount)
set N to N + 1
if N is greater than 1 then exit repeat -- Only look at the first message in the mailbox
try
set theMessageID to message 1 of mailbox theMailBox of account theAccount
set messageSender to sender of theMessageID -- Sender of the email
set messageSenderName to extract name from sender of theMessageID -- Name of sender of the email
set messageSenderEmailAddress to extract address from sender of theMessageID -- Address of sender of the email
set messageSubject to subject of theMessageID -- Subject of the email
-- set messageContent to content of theMessageID -- Text content only of the body of the email (without html) - TAKES A LONG TIME TO PROCESS, SO COMMENTED OUT
set messageRawSource to source of theMessageID -- Raw source of the email (includes html code)
-- display notification "Account: " & theAccount & return & "Mailbox: " & theMailBox & return & "Sender: " & messageSenderName & return & "Subject: " & messageSubject
set theBody to quoted form of (messageRawSource as string)
-- STANDARD STRING MATCH SEARCH OF RAW SOURCE EMAIL:
if theBody contains "searchStringOne" or theBody contains "searchStringTwo" or theBody contains "searchStringThree" then
move theMessageID to mailbox "Regex Trash"
-- delete theMessageID -- Instead of moving to the Trash, the message can be deleted instantly
end if
(*
-- REGEX SEARCH OF RAW SOURCE EMAIL: (uncomment this section to use regex)
-- Note that regex match takes more time than a standard string match search
-- Replace regexSearchStringOne, etc with your regex matches. To set up multiple regex matches, use the pipe | operator as shown below (pipe is the regex Boolean OR)
set regexString to "regexSearchStringOne|regexSearchStringTwo|regexSearchStringThree"
-- Note: to use the regex backslash \ in regexString, you need to insert a double backslash \\, because \ is the escape character in AppleScript strings
-- Perform a regex match on theBody (hard to get sed to work with newlines, so replace newlines with Q)
set theCommandString to "echo " & theBody & " | tr '\\n' 'Q' | sed \"s/" & regexString & "/M*A*T*C*H*E*D(&)/\"" as string
set sedResult to do shell script theCommandString
if sedResult contains "M*A*T*C*H*E*D" then
move theMessageID to mailbox "Regex Trash"
-- delete theMessageID -- Instead of moving to the Trash, the message can be deleted instantly
end if
*)
-- Sed regular expression syntax (the same as grep regex syntax):
-- ^ (Caret) = match expression at the start of a line, as in ^A.
-- $ (Question) = match expression at the end of a line, as in A$.
-- \ (Back Slash) = turn off the special meaning of the next character, as in \^.
-- [ ] (Brackets) = match any one of the enclosed characters, as in [aeiou].
-- Use Hyphen "-" for a range, as in [0-9].
-- [^ ] = match any one character except those enclosed in [ ], as in [^0-9].
-- . (Period) = match a single character of any value, except end of line.
-- * (Asterisk) = match zero or more of the preceding character or expression.
-- \{x,y\} = match x to y occurrences of the preceding.
-- \{x\} = match exactly x occurrences of the preceding.
-- \{x,\} = match x or more occurrences of the preceding.
-- Source of regex code: https://gist.github.com/BrettBukowski/5347463
on error errorText number errorNumber
end try
end repeat
end repeat
end repeat
end tell
on error errorText number errorNumber
tell application "Mail" to display alert ("Error in Mail Regex Spam Filter: " & errorNumber) message errorText
return
end try
@simonron Did you get it working?
Is there any other solution for matching the existence of a header, rather than having to match its contents?