Raw strings
6 months ago, back when I was reading the C source of the Emacs reader I tried to implement raw strings in Emacs. This post was supposed to be written/published earlier but I had a lot of work in between, I'm still not very comfortable writing in English and I had a hosting problem. Anyway, here it is.
A raw string is just a special syntax for a string literal where the
content is interpreted literally (especially the character \
)
i.e. nothing can be escaped or interpolated. Several programming
languages handle them e.g.:
Python: r"aa\naa" r"""aa\n"aa""" Perl: 'aa\naa' q{aa\n'aa} C++11: R"(aa\naa)" R"foo(aa\n)aa)foo"
It's very useful for regexes because every time you need to match a
character that also happens to be a meta-character (like +
or \
)
you have to escape it. And since the regex is written in a string
literal you have to escape the escape character because they both use
\
as the escape character. This process can be painful and
error-prone. Google backslash hell or backslashitis for some
examples.
Back to Emacs. I actually wrote a working proof of concept in the form of 2 patches to the reader function:
The code is not very clean and may be buggy since most of it comes from the regular string syntax code but it works:
# Python $ ./emacs -Q -batch --eval '(message #r"""ha"\nha""")' ha"\nha # Perl $ ./emacs -Q -batch --eval '(message #r,ha"\nha,)' ha"\nha $ ./emacs -Q -batch --eval '(message #r~ha"\nha~)' ha"\nha
Although the reader works, some minor parts of Emacs are broken in the
presence of raw strings (sexp navigation, font-locking, C-x C-e
,
…). These other parts of the environment need to be aware of the
new syntax and shouldn't be too hard to fix.
At this point I posted my result to the emacs-devel mailing-list which led to an interesting discussion. There was no clear consensus but I think most people realized that raw strings are not a satisfying solution to the regex problem. Some would rather have a way to write custom syntax reader in Lisp which is nice but hard to implement. Others said you're better off using rx.
rx is a macro that lets you write readable regex in the form of s-expressions:
(rx (+ "abc") "foo" (group (or "zob" "foo"))) => "\\(?:abc\\)+foo\\(\\(?:foo\\|zob\\)\\)"
I personally think raw strings have their use outside of regexes and
would be a nice addition to the Emacs Lisp language. As for the regex
I now write mine with rx
all the time. I just wish there was a built-in
way to use rx
in interactive search/replace functions. I will work
on this eventually if someone hasn't done this already.
That's all for today.