Bit by preg_replace() Backreferences

Damn. I spent a long time tracking this one down.  I migrated the platform from php 5 to php 7, and that meant the ereg_replace() methods were gone.  So I replaced them with preg_replace() and fixed up the regexes.

The title already tells you what’s up, so, you’re going to feel really smart, and I’m going to look really stupid.  It was a lot harder than it sounds.

Generally, things went well, but there was this weirdass bug with the following characteristics:

  • Dollar amounts got wiped out.  So $100,000 would show up as 0,000.
  • But it was only the first two digits and the dollar sign.
  • I played around with it a while, but it was hard to figure out what was happening, so I needed to refactor things to trace where it was happening.

Then I went down a deep rabbit hole, fixing up the SFActive code to make it more OO, slightly MVC, and testable.  It wasn’t bad, but what a pain.  It’s still not done, but it’s been a learning experience.

What I learned is, sometimes, you need to refactor to a point, and then use the knowledge to just write a new application.  A lot of the work is comprehending the code and tearing it apart.

Anyway, I digress.  I was writing some article on the site, and got bit by the bug. It was quite upsetting, so, I started to poke around in the code, and tried some different things.  I discovered:

  • $0 gets replaced with the name of a key TPL_LOCAL_ARTICLE.
    • That’s the name of a key that corresponds to the field in the template.

I did a once over on the docs for preg_replace, but it was around 4pm, and kind of hot, so I rushed through the docs a little too fast.

I kept altering the code – the running code – and started copying functions and creating new code paths where I could add debugging info.  With each iteration of testing, it just seemed to take me closer to the templating class.

That thing had some annoying code.  It had this:

$this->$var

Yeah, instead of using an associative array, template values were stored as properties on the object.

I know it’s valid code. It’s just bad form. You get this level of indirection, and it is almost invisible in the code.   So I added an array, and coded it up like this:

$this->context[$var]

Ultimately, it’s not that different… but the thing is, if there was some weird code path I didn’t notice which added a property to the object, it wouldn’t affect this array.  The context is a namespace for values that go into the template.  It’s not the object’s namespace.

That didn’t fix it, but I could add more code to narrow down the problem zone.

I would just dump the value of variables, like this:

file_put_contents('/tmp/y', $defaults);

One goes before, and one goes after. So you can see the inputs and the output.

Those just get moved up and down the code, to probe the values.

Yeah, I should use a debugger.  I just don’t have one set up for the live server.

As I went through, I’d clean up code. Indentation was fixed. Just the usual, so visual inspection was easier.

I also discovered why template variables started with TPL_.  It’s because the original template fields were surrounded by brackets: {LOCAL_ARTICLE}.  At some point, they hit a weird bug, kind of like this:

ereg_replace("{$val}", $replace, $subject);

When PHP is interpolating a string, and sees {} around a scalar variable, it considers the { and } to be delimiters around an expression, and replaces the brackets and expression with the value in the brackets.  So “$val” and “{$val}” result in the same string!

PHP was eating up the brackets, and messing up the template behavior.

To work around it, the programmers just went with that behavior, and prefixed values with TPL_, which was an uncommon string.

Back to the story.

Eventually, it narrowed down to one function.

Then it was preg_replace.

At that point, it clicked.  It was some kind of backreference in the string.

I replaced preg_replace() with str_replace() and adjusted accordingly.

It worked.

Then I verified in the docs: strings of the form $0 and \\0 are considered backreferences to the patterns captured in the regular expression.

What was happening was that TPL_LOCAL_ARTICLE was being matched, and replaced references to $0.

The other 98 values, $1 to $99, didn’t have any values, so they got replaced with nothing.

The Antipattern

If you replace a component with a more feature-full, better component, you may induce a bug based around a new feature that didn’t exist before.

Security

I don’t know what kind of security implications this may have, but here are some possible risks.

If you use preg_replace to strip out malicious code, then adding $0 after that code could expose it.  The following example shows how, if you have an administrator who inserts a $0 into the replacement text, then any attempt to remove a script tag with preg_replace is subverted.

<?php
$userinput = "alert();";
$admininput = "foo $0 bar";
echo preg_replace("/<script>.+<\/script>/",$admininput, $userinput);

While that seems like a remote possibility, there are some possible strings where you could hide $0, like in a username: $0$0RRY $1NG$0NG

My story above corresponded to this:

<?php
$field = "{field}";
$template = "text {field} text";
$userinput = "foo $0 bar";
echo preg_replace($field, $userinput, $template);

So, in my situation, the user input was being put into the replacement string… which shouldn’t be user input, or if it is, sanitize it first, with preg_quote().

See Also

https://bitquark.co.uk/blog/2013/07/23/the_unexpected_dangers_of_preg_replace

http://php.net/manual/en/reference.pcre.pattern.modifiers.php

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s