How Microsoft Teams will use AI to filter out typing, barking, and other noise from video calls – VentureBeat
Posted: April 11, 2020 at 12:49 am
Last month, Microsoft announced that Teams, its competitor to Slack, Facebooks Workplace, and Googles Hangouts Chat, had passed 44 million daily active users. The milestone overshadowed its unveiling of a few new features coming later this year. Most were straightforward: a hand-raising feature to indicate you have something to say, offline and low-bandwidth support to read chat messages and write responses even if you have poor or no internet connection, and an option to pop chats out into a separate window. But one feature, real-time noise suppression, stood out Microsoft demoed how the AI minimized distracting background noise during a call.
Weve all been there. How many times have you asked someone to mute themselves or to relocate from a noisy area? Real-time noise suppression will filter out someone typing on their keyboard while in a meeting, the rustling of a bag of chips (as you can see in the video above), and a vacuum cleaner running in the background. AI will remove the background noise in real time so you can hear only speech on the call. But how exactly does it work? We talked to Robert Aichner, Microsoft Teams group program manager, to find out.
The use of collaboration and video conferencing tools is exploding as the coronavirus crisis forces millions to learn and work from home. Microsoft is pushing Teams as the solution for businesses and consumers as part of its Microsoft 365 subscription suite. The company is leaning on its machine learning expertise to ensure AI features are one of its big differentiators. When it finally arrives, real-time background noise suppression will be a boon for businesses and households full of distracting noises. Additionally, how Microsoft built the feature is also instructive to other companies tapping machine learning.
Of course, noise suppression has existed in the Microsoft Teams, Skype, and Skype for Business apps for years. Other communication tools and video conferencing apps have some form of noise suppression as well. But that noise suppression covers stationary noise, such as a computer fan or air conditioner running in the background. The traditional noise suppression method is to look for speech pauses, estimate the baseline of noise, assume that the continuous background noise doesnt change over time, and filter it out.
Going forward, Microsoft Teams will suppress non-stationary noises like a dog barking or somebody shutting a door. That is not stationary, Aichner explained. You cannot estimate that in speech pauses. What machine learning now allows you to do is to create this big training set, with a lot of representative noises.
In fact, Microsoft open-sourced its training set earlier this year on GitHub to advance the research community in that field. While the first version is publicly available, Microsoft is actively working on extending the data sets. A company spokesperson confirmed that as part of the real-time noise suppression feature, certain categories of noises in the data sets will not be filtered out on calls, including musical instruments, laughter, and singing. (More on that here: ProBeat: Microsoft Teams video calls and the ethics of invisible AI.)
Microsoft cant simply isolate the sound of human voices because other noises also happen at the same frequencies. On a spectrogram of speech signal, unwanted noise appears in the gaps between speech and overlapping with the speech. Its thus next to impossible to filter out the noise if your speech and noise overlap, you cant distinguish the two. Instead, you need to train a neural network beforehand on what noise looks like and speech looks like.
To get his points across, Aichner compared machine learning models for noise suppression to machine learning models for speech recognition. For speech recognition, you need to record a large corpus of users talking into the microphone and then have humans label that speech data by writing down what was said. Instead of mapping microphone input to written words, in noise suppression youre trying to get from noisy speech to clean speech.
We train a model to understand the difference between noise and speech, and then the model is trying to just keep the speech, Aichner said. We have training data sets. We took thousands of diverse speakers and more than 100 noise types. And then what we do is we mix the clean speech without noise with the noise. So we simulate a microphone signal. And then you also give the model the clean speech as the ground truth. So youre asking the model, From this noisy data, please extract this clean signal, and this is how it should look like. Thats how you train neural networks [in] supervised learning, where you basically have some ground truth.
For speech recognition, the ground truth is what was said into the microphone. For real-time noise suppression, the ground truth is the speech without noise. By feeding a large enough data set in this case hundreds of hours of data Microsoft can effectively train its model. Its able to generalize and reduce the noise with my voice even though my voice wasnt part of the training data, Aichner said. In real time, when I speak, there is noise that the model would be able to extract the clean speech [from] and just send that to the remote person.
Comparing the functionality to speech recognition makes noise suppression sound much more achievable, even though its happening in real time. So why has it not been done before? Can Microsofts competitors quickly recreate it? Aichner listed challenges for building real-time noise suppression, including finding representative data sets, building and shrinking the model, and leveraging machine learning expertise.
We already touched on the first challenge: representative data sets. The team spent a lot of time figuring out how to produce sound files that exemplify what happens on a typical call.
They used audio books for representing male and female voices, since speech characteristics do differ between male and female voices. They used YouTube data sets with labeled data that specify that a recording includes, say, typing and music. Aichners team then combined the speech data and noises data using a synthesizer script at different signal to noise ratios. By amplifying the noise, they could imitate different realistic situations that can happen on a call.
But audiobooks are drastically different than conference calls. Would that not affect the model, and thus the noise suppression?
That is a good point, Aichner conceded. Our team did make some recordings as well to make sure that we are not just training on synthetic data we generate ourselves, but that it also works on actual data. But its definitely harder to get those real recordings.
Aichners team is not allowed to look at any customer data. Additionally, Microsoft has strict privacy guidelines internally. I cant just simply say, Now I record every meeting.'
So the team couldnt use Microsoft Teams calls. Even if they could say, if some Microsoft employees opted-in to have their meetings recorded someone would still have to mark down when exactly distracting noises occurred.
And so thats why we right now have some smaller-scale effort of making sure that we collect some of these real recordings with a variety of devices and speakers and so on, said Aichner. What we then do is we make that part of the test set. So we have a test set which we believe is even more representative of real meetings. And then, we see if we use a certain training set, how well does that do on the test set? So ideally yes, I would love to have a training set, which is all Teams recordings and have all types of noises people are listening to. Its just that I cant easily get the same number of the same volume of data that I can by grabbing some other open source data set.
I pushed the point once more: How would an opt-in program to record Microsoft employees using Teams impact the feature?
You could argue that it gets better, Aichner said. If you have more representative data, it could get even better. So I think thats a good idea to potentially in the future see if we can improve even further. But I think what we are seeing so far is even with just taking public data, it works really well.
The next challenge is to figure out how to build the neural network, what the model architecture should be, and iterate. The machine learning model went through a lot of tuning. That required a lot of compute. Aichners team was of course relying on Azure, using many GPUs. Even with all that compute, however, training a large model with a large data set could take multiple days.
A lot of the machine learning happens in the cloud, Aichner said. So, for speech recognition for example, you speak into the microphone, thats sent to the cloud. The cloud has huge compute, and then you run these large models to recognize your speech. For us, since its real-time communication, I need to process every frame. Lets say its 10 or 20 millisecond frames. I need to now process that within that time, so that I can send that immediately to you. I cant send it to the cloud, wait for some noise suppression, and send it back.
For speech recognition, leveraging the cloud may make sense. For real-time noise suppression, its a nonstarter. Once you have the machine learning model, you then have to shrink it to fit on the client. You need to be able to run it on a typical phone or computer. A machine learning model only for people with high-end machines is useless.
Theres another reason why the machine learning model should live on the edge rather than the cloud. Microsoft wants to limit server use. Sometimes, there isnt even a server in the equation to begin with. For one-to-one calls in Microsoft Teams, the call setup goes through a server, but the actual audio and video signal packets are sent directly between the two participants. For group calls or scheduled meetings, there is a server in the picture, but Microsoft minimizes the load on that server. Doing a lot of server processing for each call increases costs, and every additional network hop adds latency. Its more efficient from a cost and latency perspective to do the processing on the edge.
You want to make sure that you push as much of the compute to the endpoint of the user because there isnt really any cost involved in that. You already have your laptop or your PC or your mobile phone, so now lets do some additional processing. As long as youre not overloading the CPU, that should be fine, Aichner said.
I pointed out there is a cost, especially on devices that arent plugged in: battery life. Yeah, battery life, we are obviously paying attention to that too, he said. We dont want you now to have much lower battery life just because we added some noise suppression. Thats definitely another requirement we have when we are shipping. We need to make sure that we are not regressing there.
Its not just regression that the team has to consider, but progression in the future as well. Because were talking about a machine learning model, the work never ends.
We are trying to build something which is flexible in the future because we are not going to stop investing in noise suppression after we release the first feature, Aichner said. We want to make it better and better. Maybe for some noise tests we are not doing as good as we should. We definitely want to have the ability to improve that. The Teams client will be able to download new models and improve the quality over time whenever we think we have something better.
The model itself will clock in at a few megabytes, but it wont affect the size of the client itself. He said, Thats also another requirement we have. When users download the app on the phone or on the desktop or laptop, you want to minimize the download size. You want to help the people get going as fast as possible.
Adding megabytes to that download just for some model isnt going to fly, Aichner said. After you install Microsoft Teams, later in the background it will download that model. Thats what also allows us to be flexible in the future that we could do even more, have different models.
All the above requires one final component: talent.
You also need to have the machine learning expertise to know what you want to do with that data, Aichner said. Thats why we created this machine learning team in this intelligent communications group. You need experts to know what they should do with that data. What are the right models? Deep learning has a very broad meaning. There are many different types of models you can create. We have several centers around the world in Microsoft Research, and we have a lot of audio experts there too. We are working very closely with them because they have a lot of expertise in this deep learning space.
The data is open source and can be improved upon. A lot of compute is required, but any company can simply leverage a public cloud, including the leaders Amazon Web Services, Microsoft Azure, and Google Cloud. So if another company with a video chat tool had the right machine learners, could they pull this off?
The answer is probably yes, similar to how several companies are getting speech recognition, Aichner said. They have a speech recognizer where theres also lots of data involved. Theres also lots of expertise needed to build a model. So the large companies are doing that.
Aichner believes Microsoft still has a heavy advantage because of its scale. I think that the value is the data, he said. What we want to do in the future is like what you said, have a program where Microsoft employees can give us more than enough real Teams Calls so that we have an even better analysis of what our customers are really doing, what problems they are facing, and customize it more towards that.
Originally posted here:
- The Top Five AWS Re:Invent 2019 Announcements That Impact Your Enterprise Today - Forbes [Last Updated On: December 9th, 2019] [Originally Added On: December 9th, 2019]
- The Bot Decade: How AI Took Over Our Lives in the 2010s - Popular Mechanics [Last Updated On: December 9th, 2019] [Originally Added On: December 9th, 2019]
- Cloudy with a chance of neurons: The tools that make neural networks work - Ars Technica [Last Updated On: December 9th, 2019] [Originally Added On: December 9th, 2019]
- Measuring Employee Engagement with A.I. and Machine Learning - Dice Insights [Last Updated On: December 9th, 2019] [Originally Added On: December 9th, 2019]
- Amazon Wants to Teach You Machine Learning Through Music? - Dice Insights [Last Updated On: December 9th, 2019] [Originally Added On: December 9th, 2019]
- NFL Looks to Cloud and Machine Learning to Improve Player Safety - Which-50 [Last Updated On: December 9th, 2019] [Originally Added On: December 9th, 2019]
- Machine Learning Answers: If Nvidia Stock Drops 10% A Week, Whats The Chance Itll Recoup Its Losses In A Month? - Forbes [Last Updated On: December 9th, 2019] [Originally Added On: December 9th, 2019]
- The NFL And Amazon Want To Transform Player Health Through Machine Learning - Forbes [Last Updated On: December 9th, 2019] [Originally Added On: December 9th, 2019]
- Managing Big Data in Real-Time with AI and Machine Learning - Database Trends and Applications [Last Updated On: December 9th, 2019] [Originally Added On: December 9th, 2019]
- 10 Machine Learning Techniques and their Definitions - AiThority [Last Updated On: December 9th, 2019] [Originally Added On: December 9th, 2019]
- This AI Agent Uses Reinforcement Learning To Self-Drive In A Video Game - Analytics India Magazine [Last Updated On: December 31st, 2019] [Originally Added On: December 31st, 2019]
- Machine learning to grow innovation as smart personal device market peaks - IT Brief New Zealand [Last Updated On: December 31st, 2019] [Originally Added On: December 31st, 2019]
- Can machine learning take over the role of investors? - TechHQ [Last Updated On: December 31st, 2019] [Originally Added On: December 31st, 2019]
- The impact of ML and AI in security testing - JAXenter [Last Updated On: December 31st, 2019] [Originally Added On: December 31st, 2019]
- Are We Overly Infatuated With Deep Learning? - Forbes [Last Updated On: December 31st, 2019] [Originally Added On: December 31st, 2019]
- Will Artificial Intelligence Be Humankinds Messiah or Overlord, Is It Truly Needed in Our Civilization - Science Times [Last Updated On: January 27th, 2020] [Originally Added On: January 27th, 2020]
- Get ready for the emergence of AI-as-a-Service - The Next Web [Last Updated On: January 27th, 2020] [Originally Added On: January 27th, 2020]
- Clean data, AI advances, and provider/payer collaboration will be key in 2020 - Healthcare IT News [Last Updated On: January 27th, 2020] [Originally Added On: January 27th, 2020]
- An Open Source Alternative to AWS SageMaker - Datanami [Last Updated On: January 27th, 2020] [Originally Added On: January 27th, 2020]
- How Machine Learning Will Lead to Better Maps - Popular Mechanics [Last Updated On: January 27th, 2020] [Originally Added On: January 27th, 2020]
- Federated machine learning is coming - here's the questions we should be asking - Diginomica [Last Updated On: January 27th, 2020] [Originally Added On: January 27th, 2020]
- Iguazio pulls in $24m from investors, shows off storage-integrated parallelised, real-time AI/machine learning workflows - Blocks and Files [Last Updated On: January 27th, 2020] [Originally Added On: January 27th, 2020]
- New York Institute of Finance and Google Cloud launch a Machine Learning for Trading Specialisation on Coursera - HedgeWeek [Last Updated On: January 27th, 2020] [Originally Added On: January 27th, 2020]
- Short- and long-term impacts of machine learning on contact centres - Which-50 [Last Updated On: January 27th, 2020] [Originally Added On: January 27th, 2020]
- Iguazio Deployed by Payoneer to Prevent Fraud with Real-time Machine Learning - Yahoo Finance [Last Updated On: January 27th, 2020] [Originally Added On: January 27th, 2020]
- Regulators Begin to Accept Machine Learning to Improve AML, But There Are Major Issues - PaymentsJournal [Last Updated On: January 27th, 2020] [Originally Added On: January 27th, 2020]
- What Is Machine Learning? | How It Works, Techniques ... [Last Updated On: January 27th, 2020] [Originally Added On: January 27th, 2020]
- Global Deep Learning Market 2020-2024 | Growing Application of Deep Learning to Boost Market Growth | Technavio - Business Wire [Last Updated On: February 4th, 2020] [Originally Added On: February 4th, 2020]
- The Human-Powered Companies That Make AI Work - Forbes [Last Updated On: February 4th, 2020] [Originally Added On: February 4th, 2020]
- UB receives $800,000 NSF/Amazon grant to improve AI fairness in foster care - UB Now: News and views for UB faculty and staff - University at Buffalo... [Last Updated On: February 4th, 2020] [Originally Added On: February 4th, 2020]
- Euro machine learning startup plans NYC rental platform, the punch list goes digital & other proptech news - The Real Deal [Last Updated On: February 4th, 2020] [Originally Added On: February 4th, 2020]
- New Project at Jefferson Lab Aims to Use Machine Learning to Improve Up-Time of Particle Accelerators - HPCwire [Last Updated On: February 4th, 2020] [Originally Added On: February 4th, 2020]
- This tech firm used AI & machine learning to predict Coronavirus outbreak; warned people about danger zones - Economic Times [Last Updated On: February 4th, 2020] [Originally Added On: February 4th, 2020]
- Reinforcement Learning: An Introduction to the Technology - Yahoo Finance [Last Updated On: February 4th, 2020] [Originally Added On: February 4th, 2020]
- Reinforcement Learning (RL) Market Report & Framework, 2020: An Introduction to the Technology - Yahoo Finance [Last Updated On: February 4th, 2020] [Originally Added On: February 4th, 2020]
- Top Machine Learning Services in the Cloud - Datamation [Last Updated On: February 4th, 2020] [Originally Added On: February 4th, 2020]
- In Coronavirus Response, AI is Becoming a Useful Tool in a Global Outbreak - Machine Learning Times - machine learning & data science news - The... [Last Updated On: February 4th, 2020] [Originally Added On: February 4th, 2020]
- Combating the coronavirus with Twitter, data mining, and machine learning - TechRepublic [Last Updated On: February 4th, 2020] [Originally Added On: February 4th, 2020]
- Speechmatics and Soho2 apply machine learning to analyse voice data - Finextra [Last Updated On: February 4th, 2020] [Originally Added On: February 4th, 2020]
- REPLY: European Central Bank Explores the Possibilities of Machine Learning With a Coding Marathon Organised by Reply - Business Wire [Last Updated On: February 4th, 2020] [Originally Added On: February 4th, 2020]
- What is Machine Learning? A definition - Expert System [Last Updated On: February 4th, 2020] [Originally Added On: February 4th, 2020]
- How to Train Your AI Soldier Robots (and the Humans Who Command Them) - War on the Rocks [Last Updated On: February 22nd, 2020] [Originally Added On: February 22nd, 2020]
- Google Teaches AI To Play The Game Of Chip Design - The Next Platform [Last Updated On: February 22nd, 2020] [Originally Added On: February 22nd, 2020]
- Would you tell your innermost secrets to Alexa? How AI therapists could save you time and money on mental health care - MarketWatch [Last Updated On: February 22nd, 2020] [Originally Added On: February 22nd, 2020]
- Cisco Enhances IoT Platform with 5G Readiness and Machine Learning - The Fast Mode [Last Updated On: February 22nd, 2020] [Originally Added On: February 22nd, 2020]
- Buzzwords ahoy as Microsoft tears the wraps off machine-learning enhancements, new application for Dynamics 365 - The Register [Last Updated On: February 22nd, 2020] [Originally Added On: February 22nd, 2020]
- Inspur Re-Elected as Member of SPEC OSSC and Chair of SPEC Machine Learning - HPCwire [Last Updated On: February 22nd, 2020] [Originally Added On: February 22nd, 2020]
- How to Pick a Winning March Madness Bracket - Machine Learning Times - machine learning & data science news - The Predictive Analytics Times [Last Updated On: February 22nd, 2020] [Originally Added On: February 22nd, 2020]
- Syniverse and RealNetworks Collaboration Brings Kontxt-Based Machine Learning Analytics to Block Spam and Phishing Text Messages - MarTech Series [Last Updated On: February 22nd, 2020] [Originally Added On: February 22nd, 2020]
- Grok combines Machine Learning and the Human Brain to build smarter AIOps - Diginomica [Last Updated On: February 22nd, 2020] [Originally Added On: February 22nd, 2020]
- Machine Learning: Real-life applications and it's significance in Data Science - Techstory [Last Updated On: February 22nd, 2020] [Originally Added On: February 22nd, 2020]
- Why 2020 will be the Year of Automated Machine Learning - Gigabit Magazine - Technology News, Magazine and Website [Last Updated On: February 22nd, 2020] [Originally Added On: February 22nd, 2020]
- What is machine learning? Everything you need to know | ZDNet [Last Updated On: February 22nd, 2020] [Originally Added On: February 22nd, 2020]
- AI Is Top Game-Changing Technology In Healthcare Industry - Forbes [Last Updated On: February 23rd, 2020] [Originally Added On: February 23rd, 2020]
- Removing the robot factor from AI - Gigabit Magazine - Technology News, Magazine and Website [Last Updated On: February 23rd, 2020] [Originally Added On: February 23rd, 2020]
- This AI Researcher Thinks We Have It All Wrong - Forbes [Last Updated On: February 23rd, 2020] [Originally Added On: February 23rd, 2020]
- TMR Projects Strong Growth for Property Management Software Market, AI and Machine Learning to Boost Valuation to ~US$ 2 Bn by 2027 - PRNewswire [Last Updated On: February 29th, 2020] [Originally Added On: February 29th, 2020]
- Global Machine Learning as a Service Market, Trends, Analysis, Opportunities, Share and Forecast 2019-2027 - NJ MMA News [Last Updated On: February 29th, 2020] [Originally Added On: February 29th, 2020]
- Forget Chessthe Real Challenge Is Teaching AI to Play D&D - WIRED [Last Updated On: February 29th, 2020] [Originally Added On: February 29th, 2020]
- Workday, Machine Learning, and the Future of Enterprise Applications - Cloud Wars [Last Updated On: February 29th, 2020] [Originally Added On: February 29th, 2020]
- The Global Deep Learning Chipset Market size is expected to reach $24.5 billion by 2025, rising at a market growth of 37% CAGR during the forecast... [Last Updated On: March 22nd, 2020] [Originally Added On: March 22nd, 2020]
- The Power of AI in 'Next Best Actions' - CMSWire [Last Updated On: March 22nd, 2020] [Originally Added On: March 22nd, 2020]
- Proof in the power of data - PES Media [Last Updated On: March 22nd, 2020] [Originally Added On: March 22nd, 2020]
- FYI: You can trick image-recog AI into, say, mixing up cats and dogs by abusing scaling code to poison training data - The Register [Last Updated On: March 22nd, 2020] [Originally Added On: March 22nd, 2020]
- Keeping Machine Learning Algorithms Humble and Honest in the Ethics-First Era - Datamation [Last Updated On: March 22nd, 2020] [Originally Added On: March 22nd, 2020]
- Emerging Trend of Machine Learning in Retail Market 2019 by Company, Regions, Type and Application, Forecast to 2024 - Bandera County Courier [Last Updated On: March 22nd, 2020] [Originally Added On: March 22nd, 2020]
- With launch of COVID-19 data hub, the White House issues a call to action for AI researchers - TechCrunch [Last Updated On: March 22nd, 2020] [Originally Added On: March 22nd, 2020]
- Are machine-learning-based automation tools good enough for storage management and other areas of IT? Let us know - The Register [Last Updated On: March 22nd, 2020] [Originally Added On: March 22nd, 2020]
- Why AI might be the most effective weapon we have to fight COVID-19 - The Next Web [Last Updated On: March 22nd, 2020] [Originally Added On: March 22nd, 2020]
- AI Is Changing Work and Leaders Need to Adapt - Harvard Business Review [Last Updated On: March 29th, 2020] [Originally Added On: March 29th, 2020]
- Deep Learning to Be Key Driver for Expansion and Adoption of AI in Asia-Pacific, Says GlobalData - MarTech Series [Last Updated On: March 29th, 2020] [Originally Added On: March 29th, 2020]
- With Launch of COVID-19 Data Hub, The White House Issues A 'Call To Action' For AI Researchers - Machine Learning Times - machine learning & data... [Last Updated On: March 29th, 2020] [Originally Added On: March 29th, 2020]
- What are the top AI platforms? - Gigabit Magazine - Technology News, Magazine and Website [Last Updated On: March 29th, 2020] [Originally Added On: March 29th, 2020]
- Data to the Rescue! Predicting and Preventing Accidents at Sea - JAXenter [Last Updated On: March 29th, 2020] [Originally Added On: March 29th, 2020]
- Deep Learning: What You Need To Know - Forbes [Last Updated On: March 29th, 2020] [Originally Added On: March 29th, 2020]
- Neural networks facilitate optimization in the search for new materials - MIT News [Last Updated On: March 29th, 2020] [Originally Added On: March 29th, 2020]
- PSD2: How machine learning reduces friction and satisfies SCA - The Paypers [Last Updated On: March 29th, 2020] [Originally Added On: March 29th, 2020]
- Google is using AI to design chips that will accelerate AI - MIT Technology Review [Last Updated On: March 29th, 2020] [Originally Added On: March 29th, 2020]
- What Researches says on Machine learning with COVID-19 - Techiexpert.com - TechiExpert.com [Last Updated On: March 29th, 2020] [Originally Added On: March 29th, 2020]
- Self-driving truck boss: 'Supervised machine learning doesnt live up to the hype. It isnt C-3PO, its sophisticated pattern matching' - The Register [Last Updated On: March 29th, 2020] [Originally Added On: March 29th, 2020]