Hi,
Last time I mentioned that regular expressions might be useful for breaking a tag into elements. My regular expression is:
([^\s\"=]+\s*=\s*([^\s\"=]+|\"[^\s^\"]+\"))|([^\s\"=]+)
[Edit: Updating for single quotes and other improvements: ([^\s\"\'=]+\s*=\s*([^\s\"\'=]+|[\"\'][^\s\"\']+[\"\']))|([^\s\"\'=]+)]
Notice that it is awesome. I tested it on the input data:
div class=hijkjh width=100 height = 6 src= "hello.png" why=bec6=ause src="hello.png" src="hello.png?=yo?" src="hello.png" src="hello.png" src="hello.png" sup
. . . which is deliberately malformed in some cases. You can test the regex at http://regexpal.com/. Notice that the regex matches valid elements. All I have to do to determine if a tag is well formed is check whether anything other than whitespace is unmatched.
I have merged these changes into the converter itself, so the new tag system should work much better.
Ian
No comments:
Post a Comment