ISSN: 1391 - 0531
Sunday, June 17, 2007
Vol. 42 - No 03
Mirror

Organising data

Dear TPH,
Last week you referred to "content discovery technology that is capable of interpreting the meaning of different media elements such as text, video and audio." Can you please explain what you meant by that?
KT

Dear KT,
I am touched that you take such a keen interest in what I write, asking smart questions – unlike one of my friends who asked "so you actually made a TRIP to go see your pal's new 233MHz machine?!?!"

There are plenty of ways to organize data in databases and to use that data to generate useful information. Data, by definition is fragmented bits and pieces of information and therefore is relatively easy to summarize.

On the other hand, it is much more difficult for a computer to "read" an essay and "understand" what it is about. Content discovery is about finding ways to ‘teach computers’ to interpret information from unstructured data; to understand the "meaning" of a sentence and using that to interpret the theme of an essay or a letter for example. As you can see, content discovery is a lot about deducing patterns based on linguistics and semantics.

This is useful for many reasons. Most of the data we generate naturally, such as letters, essays, memos and speech is unstructured. Computers cannot process unstructured chunks of information in a meaningful way and therefore such information is difficult to search for and almost never used for report-generation. For example, you can't type "find all blog entries about content discovery" into Google and expect the search engine to "understand" what you said and return only relevant blog entries.

Content discovery therefore is mainly about finding out how words in sentences convey meaning and deducing rules that would enable computers to "understand" the "meaning" of a sentence. In theory, these rules can be applied to text and audio information through speech recognition software.

I made a comment about content discovery being the next best thing in the absence of Artificial Intelligence.

That is because they are not the same. Content discovery is about interpreting words and sentences literally – and it may lack the intelligence to 'read between the lines' or to interpret subtle variations in tone to know that someone is being sarcastic. Content discovery has many applications in the business world as well as for surveillance and military intelligence. It's most wide-spread use and innovations however will be fuelled by the average web search engines, which for once will be able to really give you the information that YOU want!

Techno Page Helpline (TPH) is our help desk that is dedicated to solving your technical and not so technical; silicon-based and carbon-based problems and ethical dilemmas. If you can withstand 'high-voltage sarcasm,' 'low-frequency cynicism' and new-age computer wisdom; outsource your questions and comments to us at technopage@gmail.com and share a few bytes of humour. When you write in, don't forget to add 'TPH' in the subject line!

 

 
Top to the page
E-mail


Copyright 2007 Wijeya Newspapers Ltd.Colombo. Sri Lanka.