Abusive tweet storms brought on by a Twitter user with a significant amount of followers using their influence to intimidate or verbally abuse another Twitter user bears some striking similarities to a DDoS attack. If you frame the problem in terms of a DDoS attack, then many mitigation techniques used against DDoS attacks can also be used against these Twitter attacks.
By opting in, users can submit their replies for analysis by the system. Using sentiment analysis, abusive tweets can be identified programmatically, and confirmed using a mechanical turk like system. In order to broaden the scope of the analysis, the abusive user's other replies are analyzed and catalogued, being flagged along the way to protect other users. Their victims' replies will also be analyzed so other abusive users can be identified. We can also use their following list to determine other potential abusive users. In this way the pool of people whose tweets are being analyzed will grow organically, and other bad actors can be identified in a
In a phase one of a mitigation solution, abusive tweets can be catalogued on the home page of the system's public-facing web site. This will allow volunteers to click the link and report the tweet as abusive. This is a necessary solution because Twitter exposes no public API for reporting abusive tweets. In addition, the front page will catalog the top 10 worst abusers in the system, with links to their profiles allowing visitors to the web site to report the user themselves for abuse.
The most effective mitigation against this kind of abuse is to filter replies on the client. I propose four modes in which the client can operate to protect the user, with escalating between modes being triggered either by volume of abusive tweets, or by the user:
In this mode, only tweets identified by the system in a reactive manner are removed from the replies feed. For most users experiencing only occasional abusive replies, this mode may be sufficient.
In this mode, sentiment analysis is run on the client, and any replies with a potentially negative message are hidden from the replies feed until mechanical turk style verification can take place. This can be triggered by a certain number of abusive tweets in a specified period of time.
In this mode, only replies from users who have already been whitelisted either by the user or by a consistently positive sentiment analysis score will get through. This is akin to blocking CIDRs where attacks tend to be clustered in a DDoS attack. As mechanical turk verification flags tweets as safe, replies can start to trickle in again.
In the most restrictive mode, replies from any user that the user is not following will be filtered, and never shown to the user. This can be used in extreme cases where even reply whitelisting is allowing abusive tweets through, or the volume of abusive tweets exceeds a particular threshold.