Hashtags play a central role in online communication by increasing discoverability, aiding search and providing additional semantics. However, inferring the semantics of hashtags is non-trivial since many hashtags contain multiple tokens joined together, which leads to multiple potential interpretations. Hashtags tend to contain rare tokens like named entities, event names, abbreviations and spelling variations, which hinders their understanding.

We developed a tool called HashtagMaster, which segments a given hashtag into a sequence of meaningful words. It uses a novel neural approach which helps with rare tokens and multiple interpretations. We also released a new dataset for hashtag segmentation consisting of 12,594 hashtags along with their manual segmentaitons. For more details, please check our paper.

Yoda Twitter Avatar by Adam Koford

Examples

Hashtag with Tweet Segmentation
want to go to the Valley Jazz Fest tonight but I need to do more work #ihaterotoscoping4eva. I have rotoscoping 4eva
#netprophet Mxit's Herman Heunis next at netprophet... I'm ready. Netprophet
Glad to know how the #twedding while being out of the country - congratluations again to @melvinyuan and ruth. Twedding
#wheniruletheworld Im gonna #fixreplies #GetPembsDaveAJob on #creditcrunchtv watch #startrek and support #maternalhealth thats #whyitweet. Get Pemb Daves A Job
#blogher09 Please contact me if you have a Blogher 09 ticket you would like to get rid of!! Blogher 09
@AmberCadabra I make a really tasty fruit smoothie, crepes, and banana fosters french toast. Can I be included #brunchrules brunch rules

Code

The code for our approach and data are publicly available on Github.

Citation

Please cite if you use any of our resources.


            @InProceedings{ACL-2019-Maddela,
              author = "Maddela, Mounica and Xu, Wei and PreoŇ£iuc-Pietro, Daniel",
              title = "Multi-task Pairwise Neural Ranking for Hashtag Segmentation",
              booktitle = "Proceedings of the Association for Computational Linguistics (ACL)",
              year = "2019"
            }
        

Contact

If you have any issue with the data, please contact maddela.4[at]osu.edu . For any question regarding the code, please open an issue on Github.