Have you ever imagined engaging in a counselling session with an AI system? A group of researchers at universities and companies based in Japan, Canada, and Italy have, but they found out that the solution needs 5G and 6G capabilities to work at full scale.
In an article published in late March in the Complex & Intelligent Systems journal, the experts explained how the AI system, called Visual Counseling Agent (VICA), lacks performance in 4G networks. They also came up with a proof-of-concept test to assess whether 5G and 6G would improve the agent’s operation.
Besides VICA, the system includes a text-based context dialogue database called CRECA. “[VICA] helps mentally distressed or anxious persons by conversation or interchanges. The components are connected through the Internet and are basically cloud-native, meaning that they can be used from anywhere, by anyone, and with any type or model of terminals,” the article said.
According to the experts, the problem is that 4G networks, coupled with cloud internet, are not enough to provide the system with a flawless experience. It is common to see up to 10-second delays in responses or recognition errors every two sentences. There are instances where users are even disconnected from VICA during the counselling session.
“Our counselling system uses ‘sensitive (or active) listening’ skills with its related methodology for showing congruence, empathy, and unconditional positive regards,” the authors explained. “Users become irritated and stop the conversation due to such response problems or low quality of speech recognition. It happens even despite using the world’s (as of today) highest level of cloud-native speech recognition servers such as Google speech-to-text cloud.”
However, not every human exchange happens through speaking. A great deal of our understanding comes from nonverbal communication like gestures and facial expressions. That’s why VICA’s next step aims to incorporate sensing and image data. But that will only be possible using more advanced communication technologies.
The 5G/6G Proof-of-Concept
While high-speed 5G is not widely available, the group of researchers developed a proof-of-concept test to assess whether 5G and 6G promises would fulfil the requirements for a fluent conversation between machines and humans.
The basic requirements are no disconnection, less than one-word error every ten or more sentences, and less than 10 seconds of total response time.
The researchers categorised these demands across four levels:
- Ordinary quality: Transport over Internet, aiming at response time of 500 ms and false rate: one phrase block of recognition error per one sentence.
- Moderate quality: Employing 5G (if available) and ordinary leased (dedicated) line or bandwidth guaranteed VPN, recognition error recovery included, aiming at a response time of 50 ms and one erroneous word per ten sentences.
- High quality: Employing 5G (if available) and multiplex fast dedicated line, multiple redundant intercarrier facility for improved speech recognition error recovery, aiming at a response time of 10 ms on average and one erroneous word per ten sentences.
- Very high quality: Allocating 5G (if available) and MEC Cloud, multiple (redundant) intercarrier facility for elaborated speech recognition error recovery, aiming at a response time of 5 ms on average and one erroneous word per 30 sentences.
To simulate the effect of a 5G/6G network operation on an enhanced version of VICA, the experts replaced the cloud server with a local server. Also, the AI server was set in a European/American communication provider’s edge and core network to simulate the 5G slice effect of lower latency at the edge compared to lower latency from intercarrier interference.
The test had four Japanese participants: a 30-year-old female IT worker, a 50-year-old male IT chief engineer, a 50-year-old male retiree, and a 70-year-old IT scholar. They were given a paper containing Japanese sentences to speak to VICA for counselling and a paper to record the test results.
According to the article, the speech recognition error rate using the cloud was more than double than those in mobile edge. Using the cloud, researchers found 25 dropped or disappeared sentences, while the edge recorded 12. The cloud service also recorded a higher rate recognising wrong words – 38 versus 10, respectively.
“Such results show that quality assurance using 5G/6G is necessary and has increased efficiency when used with the VICA Counseling (ro)bot. Image recognition AI or sensor fusion AI could be crucial for more accurate recognition of patients’ feelings exhibited by body language such as facial expressions,” the document reads.
Even though research is still developing in this area, the authors acknowledged the next step for VICA and 5G/6G is expanding the range of recognition tools. “In the near future, experiments for video data transportation will be conducted. These experiments will aim to analyse and unveil the effects/problems of 5G for quality counselling bot/robot, as well as to improve 5G towards future network standards such as 6G,” they concluded.
Featured image by Ketut Subiyanto/Pexels