Sunday, April 27, 2008

On Google news Quotes

I've been pondering how google news attributes quotes to people. Matt Hurst points out that only some people have quotes listed in google news and that some of the people who are not listed are important world leaders.

At first I thought they were scanning around quotes to match against named entities. I believed that they used a named entity tagging as part of their quotes feature. But then I noticed this well known quote:
"I remember landing under sniper fire," she said. "There was supposed to be some kind of a greeting ceremony at the airport, but instead we just ran with our heads down to get into the vehicles to get to our base."
Which was sourced to Some Article. But in the quote there was no mention of who "she" refers to. Inside the article the nearest named entity to the quote is Sarajevo which is a place (but still a named entity), Hillary Clinton is mentioned by name further up and is the only woman mentioned in the article. This shows evidence that Google news is doing some sophisticated processing to get these quotes. I wonder if Google disambiguates between multiple people of the same gender?

And then Just when I'm starting to get impressed with Google all over again I check out this quote:

"Let me tell you something," she told labor leaders firmly in Philadelphia. "When it comes to finishing the fight, Rocky and I have a lot in common. I never quit. I never give up."

Which they source to The New York Times, but when I follow the link the quotes not there! I'm getting flashbacks to past Google mess ups. I know it was only an estimate... but this little things are starting to pileup. Anyone else noticing that Google is getting a little bit sloppy?

