Did you know sound waves and sound recognition analysis has spanned decades. MIT, started controlling logic with sound files in the 1960’s. Recent hardware advances in the 1990’s made sound recognition and even control of computer usage by the disabled possible with only their voice. So sound is not the toughest, perhaps it is the easiest or the least difficult.
Picture recognition. Is pattern recognition. Every modern computer, phone and TV has LCD and hi resolution screens and the cameras that capture the images are high resolution. More resolution more pixels. More pixels more data. More data more accuracy. But let’s say this is an overly simplified example. Why has it only been recently that millions of pictures can be easily identified? It took years to build a training set. As the years of training crept on, the hardware to process the video images got faster and cheaper to purchase. Now we have a video processing power on a video card that can process thousands of images a minute. So now we can process images faster, and label them faster into a model that can be shared and reach the high 90 percentile range of recognition more often than before. So picture recognition comes in 2nd most difficult.
That leaves text. The written word is the most difficult to determine every time. Words have multiple and in our minds almost simultaneous meanings just from a purely academic perspective. If I say, “watch out for that glass” Did I mean a watch that is on my wrist is out for a glass? Or do I want someone to observe a glass that I worry they might knock over? Or more specifically broken glass? When we understand text sometimes we get to hear something that helps us understand, like inflection and environmental cues, and sometimes we get to have visual cue like seeing what type of glass it is, whole or broken. So our brains playa huge role in understand the meaning of words. As such teaching a computer to understand and master all these ideas makes text our #1 most difficult type of AI to process.