Fixed a bug in language detection

Freshly introduced, already found a bug. When retrieving the summary of a blog post in the RSS feed, I am getting HTML. I already stripped the tags. But I didn't decode HTML entities. With chinese characters, pretty much everything is an escaped HTML entity and that was recognized as English. So I added HTML entity decoding, too. Language detection works better now!