Saturday, December 25, 2010

RTL Character: Weird Discovery On HTML ISO Characters

I was looking for the html entity version of ► (obviously I found it already)

For those who doesn't know html entity, it is the equivalent code for a character, which is used for characters that are not on the keyboard or for preventing a character to be interpreted as html character such as the less than sign (<)

I used PHP for loop to find it, however, I discovered a weird behavior, I was confused because I saw the numbers are all rumbled.

The PHP Code was:

<?php for($x=1;$x<=10000;$x++) { echo "$x: &#$x; <br />"; } ?>

It is coded to show the number on the right side and the character corresponding(ascii or iso)

This discovery means that it adding &#8238; can actually mess up a website, and I think I should watch out for such vulnerability. Making it look gibberish and unreadable.

<div> This text will stay as left to right <div>Hi, I will put RTL character after <b>this</b><span>&#8238; it will become</span> <span>right to left</span> and unreadable/gibberish </div> </div>
This code results to:
This text will stay as left to right
Hi, I will put RTL character after this‮ it will become right to left and unreadable/gibberish

This affects all the child elements, and all sibling elements and ancestor elements as long as they are all contained in the same <div>, which gives another reason to use <div>

Security Solution

Use htmlspecialchars() or htmlentities() on user input data, when printing them on your html. You can also use div's but that's just not reliable.

You can also target this character via str_replace or preg_replace but who knows what other characters can mess up your website

Or you can just ignore it. After all it will appear rarely, affect your website temporarily and it will require real intension to destroy your website even temporarily. But one day you'll be forced to choose your defense against it

Labels: , ,

0 Comments:

Post a Comment

Subscribe to Post Comments [Atom]

<< Home