It’s no secret that training machine learning models require substantial amounts of data. The same holds true for building and training models for speech recognition engines. Providers of speech recognition are constantly looking for data that can be used to train language models. However, it should always be done through the lens of data security.
Gathering that data is a requirement in creating or improving languages. Generally (although not always), more data increases the accuracy of a speech-to-text system. The way in which providers go about this process is vital to the value it can provide. A Netflix documentary called ‘The Great Hack’ (summed up nicely by Tech Crunch) shows that organisations are going to great lengths to capture data from their customers. They are potentially exploiting the users of their technology to get a jump on their competition. Those actions are likely a gamble though. It is not a recommended gamble either.
An article from the Independent revealed that Google contractors “listen to private audio recordings from Google Home smart speakers”. However, it’s not just Google. This year the Independent reported that Amazon admitted that employees listen to customer recordings from Echo and other smart speakers. It is a worry that end–users’ voices and conversations are being recorded unknowingly. Further, these large tech organisations have access to personal moments of our lives. Recordings include arguments, intimate moments and sensitive personal data like bank details and other private information.
Not only is your Google Home device listening to you, but Google has admitted employing 3rd party organisations to listen. They are also apparently reviewing the content of these recordings. This means that an actual person is also listening to your personal conversations. Google claims that “listening to recordings is ‘critical’ to improving its AI voice assistant.” They also claim to only transcribe and use “about 0.2%”. On the surface that doesn’t sound like much. However, it still amounts to up to 150 thousand hours of audio clips per year. That is an unimaginable amount of personal voice data.
Data is used for two reasons within a speech-to-text service – training and usage. When using cloud speech-to-text providers the end–user must trust the provider. The provider must not use their data for something that hasn’t been agreed upon. An on-premises deployment option mitigates this problem. It ensures the technology used for training is separate from the customer. This enables all data that interacts with the deployed solution within the customer’s environment stays in that environment. No interaction is necessary with the speech-to-text provider or their training. That creates privacy and security that benefits the customer. This gives the customer great confidence that the provider will never cultivate data unnecessarily.
People’s data security shouldn’t be taken for granted. It is the duty of every organisation in the workflow to be responsible for the data that transits their solutions. It is a necessary duty to build trust and establish a good working relationship between the end–user and the solution provider.
As a Product Marketing Manager at Speechmatics, I get lots of visibility over how we communicate with our customers. Because we communicate openly with customers, we know about any tools and processes that may impact their data security. As a result, Speechmatics’ technology can be deployed in public or private clouds and on-premises to create more security. The latter offers best-in-class data security and flexibility. It ensures that customers can deploy our speech recognition technology in a way that aligns with their business case. Creating in-house deployment allows a greater deal of internal security for the critical data of a business.
On-premises deployment options are accessible through our Virtual Appliances and Containers. It optimises real-time use cases, reducing latency and improving the speed of live transcription. Additionally, it also ensures that customer data remains within a secure environment, and is totally abstracted from Speechmatics. On-premises deployment delivers the highest level of data security and data compliance even in dark site deployments with no connection to the internet. Meanwhile, it still takes advantage of Speechmatics’ world-leading speech-to-text services.
Transparency should be a priority for all organisations that handle people’s data. It should be a foundation element in the ongoing build of the system. Data should be ethically sourced and screened before it is accepted. By doing that, it ensures organisations comply with PII, GDPR and other personal data requirements. Ultimately, data should always be handled with care, this is the bottom line message. It should be recognised as a key element of the process going forward.
While data is important, the volume of data is not the silver bullet for delivering the highest levels of accuracy. Lots of ASR providers utilise huge amounts of voice data leveraged from smart speakers and other similar applications. However, Speechmatics constantly ‘filters out the rubbish data’ gathered from multiple sources. This approach, whilst utilising our world-leading speech team, enables us to lead the way in speech-to-text markets.
Alex Fleming, Speechmatics
Want to find out more about how we handle our customers’ data? Or perhaps you’d like to find out how speech recognition could work for your business? Get in touch.