dimanche 19 juin 2016

Regular expressions to match Python multiline comments and string literals [on hold]

Trying to better understand the concept of regular expressions and two specific cases in Python give me a hard time:

multiline strings/comments with triple single quotes, and

string literals (as seen in the docs).

Example input text:

'''Import modules'''
''' Import 
tmodules '''

"""
Given a value, this class gets data from http://www.data.com.

It provides:
- Number
- Mean depth
- Mean magnitude

"""
r'abcdevt' # or any other string prefix combination

Specifically, I cannot understand how for the triple quote string/comments I can match newlines, tabs and anything else in between the start and end triple quotes. For the string literal, I found it hard to match all those combinations of stringprefix, shortstring etc. as seen in the docs.

I have tried to search in the docs and source of modules like [tokenize][2] in the hopes of seeing working examples of regular expressions in Python with no success.

I hope that regular expressions would be -relatively- easy and short to solve the matching of multiline comments and string literals. Any example code to tackle these cases (a regular expression for each case) along with a small explanation to aid my understanding will be much appreciated.

If regular expressions are not the appropriate solution, if someone could suggest a different route that would be great; I have looked into the **tokenize** and **ast**modules but could not find any useful examples to suceed my goal.

Aucun commentaire:

Enregistrer un commentaire