Google’s Failing Grade – Chrome-based Chinese-to-English Translator

Being listed on Alexa Top 500 means something – or so I figure every time I hear it being used again in an article as a benchmark.

It’s been months, and maybe even a year now, since my last visit to Alexa. The list has certainly evolved and shuffled since my last time seeing it:

There would be no point for my having mentioned Alexa, had it not led to my discovery of two precarious websites resting amongst US-based powerhouse tech companies.

The two sites I notices that stuck out in the top 20 were:
#17: Sina.com    (Direct url: http://www.sina.com.cn/)

What does a Top-ranked Chinese Site look like?

So naturally, I had to explore a bit. I visited Taobao first and noticed (what appeared to me as) a completely jumbled page covered in Chinese writing. Using Google’s Always-Most-Recent Chrome Browser, I accepted when prompted for a Chinese-to-English Page Translator:

During this interaction, I thought to myself, what a wonderful service and technology this has become – And one my children will likely almost-completely take for granted.

Before any translations are accepted (on the Chrome toolbar-dropdown) it looks like this:

After proceeding with the Translate option the page looks like this:

It was hard on the eyes. And for a website globally ranked in the top 20, I questioned why Google’s Chrome-to-Translate technologies wasn’t able to translate a site many consider, “the Amazon of China”.

After Leaving Taobao, I returned to the top rankings list. I found Sina, which is currently ranked at #17. This is when something not-at-all-pretty was uncovered (after again accepting Translate):

I assure you I was set to 100% (default) zoom in my browser settings. Here is a closer look:

Just to be sure, I translated the Japan-based version of Yahoo (aka Yahoo! Japan):

And it looks dramatically better.

The Challenge Exists Especially with Images

I am aware the Chinese language is one of the toughest for any algo to consistantly crack. And it may also have a lot to do with the image-based media displayed throughout such China-based pages.

examples of main slideshow (middle of homepage; taken from Taobao.com):

I just wanted to bring this up. And maybe the guys at Google can think further into this problem. It is a pretty big problem when you think into the underlying contexts (in the sense of limiting access to shared information; business politics) and maybe the guys at Google can think into it a bit more.

…or maybe they already have and they “know what they’re doing.”

Note: I left out QQ.com, which is #11 on the Alexa Global List. So that there would be one high-ranked gem leftover for readers yet to discover.

Advertisement

The Monty Hall Problem

The Monty Hall problem is one that you may be familiar with already (and you probably do not even know it).

Some of you may remember this scene in a recent movie:

The problem has been debated by statisticians for decades.

.

My take on the Monty Hall problem:

You are on a game show and the host asks you to pick from three doors (one door has a car behind it). You pick door #1. The host opens door #3 (which uncovers a goat behind it) and offers you the option to change your choice from door #1 to door #2.

At the beginning you had a 33% of guessing the door that hides a car behind it. After the host opened door #3 (and showed you the goat) only two possibilities remain: door #1 and door #2. That’s a 50% chance that either door has a car behind it. When the host asks if you would like to change your choice to door #2, a new game has officially begun.

Bayes’ theorem, as it applies to the Monty Hall problem is still appicable. Accept now, where the denominator is computed using the law of total probability as the marginal probability as seen here:

is inapplicable.

Why? Because the marginal probability cannot be measured yet in this new game (and is unnecessary because the solution will be presented by the host after affirming choosing door #1 or door #2).

…in other words, switching your choice to door #2 will give you the same odds (50%) as keeping your original choice.

In the first game there were three doors. In this new game there are only two doors.

If you consider this game as a new one, then switching your original choice from door #1 to door #2 (as encouraged by Vos Savant) now seems arbitrary.

Conceptual Vision to Limit Citizen’s Privacy Invasion by the US Government (Part 1)

Popular sites like Facebook, LinkedIn, YouTube, and others are compiling tremendous amounts of personal data on their users.

Imagine the depth of information the United States Government is capable of retrieving on a person. The FBI, CIA, IRS, and other intelligence agencies (all funded by taxpayers) undoubtedly host an even more complex consortium of information.

Such agencies’ rights to compile this data is an issue in itself. But the reason for this blog is to address the due process procedures that need to be re-written (and adhered by the [once-citizens/now-]agents of all our government branches).

What can you tell about this person?

Two questions I have for each US Citizen:

Do you feel our government already knows more than you would like them to about you?

Do you feel it is reasonable to set new laws that require government agencies to follow clearly defined procedures before prying into personal aspects about your life?

 

I will try to pass this point with a more pathos-infused real-world situation:

In a municipal court case resting on the key testimony of Officer JOE, who arrested Citizen TOM for negligent driving, the case has been pushed to trail. A jury of Citizen TOM’s peers will decide his fate – whether or not TOM will receive a charge of negligent driving on his permanent record (and any applicable fines, community service, and/or jail time).

Before giving testinomey in front of TOM’s peers, Officer JOE coordinates an investigation in coordination with his friends in the Prosecuting Attorneys Office. Fortunately for JOE, the reach of his investigation goes much further than the filing cabinets in the PA’s office. This is because the Data Records System used by the PA’s office are shared with many other prominent government agencies.

In this particular case, the PA’s office is able to uncover a small treasure trove of private images (displaying TOM with his friends around their fast cars), two YouTube Videos titled, “How to Drift on Interstate-5 Without Getting Busted” and “My car can reach 190mph, can yours?”, as well as several personal emails between TOM and his friends regarding past racing meetups.

When Officer JOE takes the stand he is questioned about the public service he has provided to his community as an officer, his past awards and recommendations, the time he saved the mayor’s daughter from a kidnapping, and oh, briefly he is asked about his observation of TOM’s negligent driving. Once Office JOE steps down from the stand, the PA presents the ‘uncovered’ incriminating evidence to the jury.

When TOM takes the stand he still in shock about the evidence that has just been presented against him. He tells the jury that the racing videos from YouTube were taken when he was 16, deleted shortly after, and are anecdotal pieces of evidence that are now almost 8 years old. A similar explanation of the emails and photos is given by TOM to the jury as well.

All of the evidence, TOM claims, was ‘permanently deleted’ many years ago while TOM began seeking employment after graduating from college. Lastly, TOM tells the jurors that he has never received any sort of traffic or parking ticket before the one cited by Officer JOE and that he was making the improper lane changes (that constitute his charge of negligent driving) on the day that Officer JOE cited him.

The jury hands down a verdict of ‘guilty’ after only four hours of deliberation.

What happened in this hypothetial court case?

Can you see where the evidence held against TOM was done so unfairly?