Thursday, December 29, 2011

Sunday, December 25, 2011

Custom Facebook Emoticons

People wants emoticon. And whenever the chatting system does not allow to customize them, they start to complain.
Think of Skype. They sometimes add 2-3 new emoticons, and people starts again ranting.

Another place where I have been watching people asking for custom emoticons is Facebook. Many hoaxes came around, like "spam this page to everyone in the world and you'll receive custom smiles!", but few know that a custom emoticon system is already there.

Further, we all have a custom emoticon of ourselves!

Try this. I want a trollface emoticon to add into chat message. I search for a page and find this one:
https://www.facebook.com/pages/Trollface/313306795360291

Now, I note that there is a numerical code in it: 313306795360291

If in the chat board I type [[313306795360291]], this emoticon will show up:  

If you try with a personal page, you might find a link like:
http://www.facebook.com/profile.php?id=NUMBER

Again, use [[NUMBER]] and you will get that person's emoticon!

Sunday, December 18, 2011

Code Obfuscation (Example 1)

Following through the series of posts about code obfuscation and unmaintainable code, I want to show here a C code snippet:


#define x(y,z) z , y
#define y(z,x) x z
#define a(b,c) c##b
#define x167 }
#define x879 x167;
#define x798 '\0'x879
#define x98 {
#define _ a(ain,m)
#define __ ()
#define b(c,d) a(c,d)
#define w a(ar,ch)
w ____[] = x98 x(0154, x(0145, 0110)), 108, 111, 30+02, 87, 0x6f, 0162, 0x6c, 0x64, x798
w ___[] = x98 '%', 's', x(x798, '\n')
_ __ x98 y((___, ____),a(ntf,pri)); x167


This is indeed a full working program, and the techniques used here to screw the code are simple enough. What's the actual purpose of this program?

Compile & running it (or gcc -E) are good means to see in its internals.

No further explanation is being given here, as I consider this to be easy enough!

Thursday, December 15, 2011

Drive Safely

A recent scientific result tried to clearify everything about driving:



The study was to show that, unless you're in a situation where three to five minutes will make a significant different in an outcome, there's no point in speeding.

But it should be noted that this graph actually suggests that you reap a pretty significant benefit from speeding if the speed limit is less than 30 mph.

So the moral of the story is: speed in school zones and residential areas, not on the highways.

Monday, December 12, 2011

Converting PDF into Plain Text

At work I often receive pdf documents with no OCR information in it, and in order to modify/extract parts from them I need to convert them to plain text.

As a Unix user, I love the command line, and I have been searching for something like:

./pdf2text file.pdf

So this is what I have written in order to provide support for this useful command line tool. Is is based on tesseract, which is now under development by Google.
Remember that in order to interpret documents which are non-English, data files for the language must be installed separately, as most distributions just bring the English one with the main package.

As tesseract is able to work with TIFF images, the first tool to be developed is pdf2tif. If none is already available on your machine, you can use the following script which is based on ghostscript:

pdf2tif:

#!/bin/sh
# Derived from pdf2ps.
# Convert PDF to TIFF file.

OPTIONS=""
while true
do
        case "$1" in
                -?*) OPTIONS="$OPTIONS $1" ;;
                *) break ;;
        esac
shift
done

if [ $# -eq 2 ]
then
        outfile=$2
elif [ $# -eq 1 ]
then
        outfile=`basename "$1" .pdf`-%02d.tif
else
        echo "Usage: `basename $0` [-dASCII85EncodePages=false]
        [-dLanguageLevel=1|2|3] input.pdf [output.tif]" 1>&2
        exit 1
fi

# Doing an initial 'save' helps keep fonts from being flushed between pages.
# We have to include the options twice because -I only takes effect if it
# appears before other options.
gs $OPTIONS -q -dNOPAUSE -dBATCH -dSAFER -r300x300 -sDEVICE=tiffg3 "-sOutputFile=$outfile" $OPTIONS -c save pop -f "$1"


This script will be later invoked by out pdf2text, which will take as input a pdf file, which will create a temporary folder, convert every pdf page into a separate tif image, and then feed tesseract with them.

pdf2text:

#!/bin/sh

#!/bin/sh

# takes one parameter, the path to a pdf file to be processed.
# uses custom script 'pdf2tif' to generate the tif files,
# generates them at 300x300 dpi.
# then runs tesseract on them

mkdir $1-dir
cp $1 $1-dir
cd $1-dir

pdf2tif $1

for j in *.tif
do
        x=`basename $j .tif`
        tesseract ${j} ${x}
        rm ${x}.raw
        rm ${x}.map
        #un-comment next line if you want to remove the .tif files when done.
        #rm ${j}
done

cat *.txt > $1.txt
mv $1.txt ..

cd ..
rm -rf $1-dir

After its execution, a .txt file with the same basename as the pdf's will be created, containing the OCR'd text in it!

Saturday, December 3, 2011

Latex in HTML (revisited)

Some time ago I published a post where I was presenting an easy tool to be integrated into web pages which allows to render LaTeX equations just embracing the code into $ sings.

I never noticed how much I used the dollar sign until I developed this script!! It has texified so much text which wasn't expected to be, that I decided to change the approach a bit. Now, Instead of parsing the whole document, I'm just parsing the divs the class of which is "latex".

So now, depending on the class of the actual container of the text I can have this behaviour:

$3 + 2 = 5$

or this one:

$3 + 2 = 5$

As easy as changing the div's class. The code in the previous post has been updated accordingly!