mercredi 13 juillet 2016

Python Regex to exclude the email domain and special character and extract @user in the twitter

I have a string twitter text as following: str = "RT@aquage_7: 田@tianke おっ(´・ω・`) @_@, @__田科, my email is tian@gmail.com, his@kate, I like @lucyさん, and her email is kate@163.cn". The regex pattern is: p_name3 = re.compile(r'[@@]([a-zA-Z0-9_]{1,15})') But the result is: ['aquage_7', 'tianke', '_', '__', 'gmail', 'kate', 'lucy', '163'] I hope the result is: ['aquage_7', 'tianke', '__', 'kate', 'lucy'] I mean I want to exclude the email domain name(please don't just focus on these two email domains) and special characters such as: @_@, @____@. In addition, you should know that the twitter user name include: a-zA-Z0-9_ and total character number is between 1 and 15. please give me your hand to solve this issue and trouble me for serval days. Thanks in advance.

Aucun commentaire:

Enregistrer un commentaire