How to build an Amazon Echo explosion model smart speaker?

As an outstanding representative of smart speaker products, Amazon Echo has been highly favored by the market since its release in late 2014. It is understood that Amazon Echo sold 4 million units in 2015 and 8 million units in 2016, and its sales target in 2017 was 10 million units. This also makes many other smart speaker manufacturers envy.
For Amazon Echo, its core technology is its integrated intelligent voice assistant Alexa. As early as June 25th, 2015, Amazon announced that it will open up Echo's built-in AI assistant Alexa's voice technology for free use by third-party developers. Subsequently, Amazon released a version for third-party developers, users can control Alexa through other electronic devices, without having to pass Echo. This also made many third-party smart speaker manufacturers have the opportunity to use Alexa to create a smart shadow box similar to Amazon Echo.
The following will introduce how to develop a smart speaker that resembles Echo.
Since I have been focusing on wifi audio for five years, I am familiar with the technology of WIFI audio, so I have recently received a lot of product managers from large companies in Beijing, Shenzhen, and Shenzhen to come and consult and found that there is no research on Echo's R&D. In the past, Echo's product is still only to understand the Echo sound speaker with a Bluetooth stereo understanding. So, today, I share with you my technical development, resource integration, market conditions, and the accumulation of future development to reduce everyone's doubts.
The history of Amazon Echo
First of all, a brief introduction to Amazon's echo this product, this product is Amazon spent 2,000 R & D team, which lasted 4 years to build products.
Before this, I have to mention its predecessor, SONOS. SONOS is a US product that was established in 2002 and lasted for 6 years. The company's core technology members come from Qualcomm, a high-tech communications company in the United States. 6 years of technical research will WIFI bandwidth, high-fidelity transmission and decoding lossless, multiple devices each other (use wifi wide bandwidth, will connect several audio to the same LAN to play the same song at the same time, create a wireless surround sound In the meantime, using the technology exchange and business cooperation controlled by its music cloud service, SONOS Audio's APP integrates resources of mainstream music content service providers in the United States, China, and Europe, such as Pandora, Spotify, QQ music, shrimp, and more meters. Rich content resources. As a result, the company has brought a high reputation and rich returns. In 2005, Steve Jobs sighed when he saw the SONOS sound: "This is the future of sound." Long-term technology research and development and investment (it is said that R & D investment of 200 million US dollars) to SONOS sound brought good income, single 2011 SONOS sound sales reached nearly 200 million US dollars.
SONOS audio has five classic products, including: three wifi audio, a soundbar, a WIFI bridge box, WIFI control. But its shortcomings are also obvious: expensive, bulky, only suitable for high-end consumer groups, and its core advantage multiroom function can only reflect its advantages in the villa.
In November 2010 SONOS entered China, so from 2011 China has had the first batch of developers of wifi audio, hoping to imitate the success of SONOS. The earliest group of developers started development with MTK5350 or Qualcomm's QCA9331, but they are undoubtedly down the road of learning. Then Amazon began to intervene and eventually succeeded.
Amazon's Alexa project probably started in 2011, and it was inspired by iphone4s voice assistant Siri (and inspired by the "Flying voice input method." It must be said that Jobs is really a great product manager). As for Echo (Echo tap, Echo, Echo dot) is the most successful product Alexa actually applied.
Alexa is a speech recognition cloud service platform that Amazon has invested heavily in. Like an APP Store with voice search function (main features are intelligent Q&A, music services Amazon music, Spotify, Iheartradio, etc.; news service NBC; weather forecast service; taxi service Uber; smart home service IFTTT protocol lights, etc.), So far there have been more than 5,000 services connected to Amazon's Alexa cloud platform.
Echo is similar to Apple's mobile phone, not only includes hardware, but also includes an operating system based on the Linux system developed independently (similar to Mac OS).
Echo removed SONOS's bulky, expensive and other shortcomings. Of course, the most important thing is to have Alexa do speech recognition background, the product has a selling point and sells particularly well. In fact, Echo is not only a wifi audio, has become a home smart center products, through the continuous collection of user habits, collecting user problems training Alexa, so that Alexa is more and more intelligent. Alexa also integrates more and more services, making consumers more and more like this product.
Amazon is an Internet company that is pursuing not only the billions of dollars in sales generated by Echoâ€™s significant growth in sales, but also the huge market for artificial intelligence represented by Echo behind Echo. Therefore, Amazon has opened Alexa's API interface to developers around the world. Developers can use Alexa to develop many different hardware products, such as: Alexa stereo, Alexa car, Alexa headphones and so on.
How to develop an Echo audio product?
1, hardware:
(1) Networking - WIFI chips or modules (Broadcom, Realtek, MTK are all available).
(2) Voice algorithm local processing unit: It is mainly to solve the problems of noise reduction (background noise elimination, echo cancellation); wake-up words wake-up; speech direction recognition. Need to use the MIC array (Microphone) + ADC (audio analog to digital signal chip) + DSP (voice digital signal processing chip) (usually by CPU simulation).
The MIC array is responsible for voice collection. In general, the more sound the MIC has, the better the effect of sound collection, but the more complex the algorithm, the higher the frequency requirement for the CPU.
The ADC converts the voice analog signal collected by the MIC to a digital signal, and physically filters out a portion of the external noise, leaving only the 20-20 KHz frequency band.
DSP is responsible for the operation of voice digital signal processing, divided into two kinds: one is to use a powerful CPU resource to simulate noise reduction algorithms to process voice audio digital signals, and one is to use a chip hardware processing to finally get a clean human voice information.
After the processing is completed, the clean voice information is encoded and uploaded to the cloud server using the wifi signal, and further processing is then performed (speech recognition, semantic understanding, service call).
The parameters of the MIC mainly focus on the sensitivity and the consistency of several MICs; the ADC mainly focuses on the analog to digital sampling rate. Generally, the requirements of Amazon's vocoder sampling format are 44.1KHz & 16bit, which is 192KHz & 24bit with the audio IIS audio high fidelity output. Inconsistent specifications, it is difficult to make compatibility, especially the MP3 decoder chip external program.
(3) The central unit of the complete audio system: CPU (processor) + DDR (memory) + Flash (storage)
The main function of the CPU is to run the entire audio operating system. Including the underlying Linux system, similar to the BIOS that is the driver for each hardware - CPU SDK, WIFI or Bluetooth driver, audio DSP processor driver (usually alsa protocol), USB driver, IIS driver, GPIO call, etc.); There are applications on Linux, similar to the Android system or window system, including the network upper layer communication protocol - Smartconfig agreement distribution network, AP-Station protocol; network data downlink protocol Http, DLNA, Airplay, Qplay; network services API interface configuration Spotify, Alexa, QQ music, Himalayan music and other services; operating system logic control unit; soft decoding audio player;
In addition, the aforementioned voice algorithm operation is also completed by the CPU. The general CPU is connected to the ADC chip through the IIS, and the ADC chip is connected to the analog MIC. One IIS can connect with two ADCs. One ADC can connect four MICs. However, some chips originally developed chips that directly support the digital MIC interface. The cost of the ADC chip is eliminated, but an algorithm is needed to satisfy this direct digital MIC input.
(4) Audio decoding unit: Generally, the CPU runs off the soft Codec and there is also a hard-coded decoder that uses an external MP3 chip that supports the IIS interface.
(5) Sound Unit: Audio DSP + Amplifier + Speaker
The audio DSP needs to be debugged, especially the audio debugging required by the EQ is even more difficult. After debugging, the tuning parameters are given to the CPU's music player. In addition, the general MP3 chip hard decoding mode can not support audio DSP.
Power amplifiers are divided into digital amplifiers and analog amplifiers. If DSPs have DAC functions, analog amplifiers can be connected. Digital amplifiers can only be connected to digital amplifiers. The mainstream digital amplifiers are now used. As the wifi signal consumes more power than Bluetooth, the power is also greater, and the effect of the audio effect may be greater. In the circuit design and layout of the PCB, signal shielding is needed to prevent the video circuit from disturbing the signal of the audio circuit. Once the interference will introduce a large noise floor at the speaker end due to the signal amplification of the audio circuit, the sound quality is not good.
(6) Power Management Unit: There is usually a power management IC that is responsible for allocating the current and voltage output from the AC or lithium battery. Echo does not have a built-in battery, so there is no need for a battery charge and discharge management circuit. DCDC can also be used to make a simple power supply circuit.
2, software
The software here refers to the complete system software on the audio side.
Now there are Android system development audio, there are Linux development Echo. Android system stability is not as good as Linux. There are many redundant functions that need to be cut by the bottom layer. It is possible that the system will be unstable due to clipping error. The maximum clipping will also result in the system firmware having 150MB, at least 256MB of DDR, and 512MB of nanflash. At the same time, the boot speed will be 10-15s slower than that of the linux system. The longer it is used, the more system redundancy will lead to slower running. Therefore, no matter from the cost of the device, the stability of the system, or the user's experience on Android is not suitable for wifi audio, it is no wonder that Amazon will choose to use linux to develop Echo audio, although the development time spent more than two years. The linux system audio only requires 4MB of firmware only 16MB of norflah, 64MB of DDR, boot speed can be achieved within 10S boot, and will not produce redundant files, consistent user experience. So, SONOS and Echo are all used Linux platforms.
The operating system includes:
aã€Driver of each peripheral device (DSP, button, AUX, USB structure, IIS, IIC protocol), distribution network, voice algorithm running, music player, content docking integration, network transmission protocol (HTTP, DLNA, Airplay), service API interface docking and so on.
b. The algorithm for local speech recognition includes two parts for the algorithm, one part is local noise reduction, background noise elimination algorithm, local wake-up algorithm, and the other is cloud speech recognition and semantic understanding algorithm. We start with the first part of the speech recognition algorithm on the device side. The device side needs to eliminate the background noise and get a clean voice content after noise reduction. There are two ways to solve this problem. One is through Conexantâ€™s DSP chip with DSP, and the digital voice signal converted from the ADC after being solidified into the chip inside the chip is transmitted to the CPU through the IIS interface and then uploaded to the cloud. The IIS interface can connect two ADC chips. Each ADC chip can connect four MICs. Of course, the wake-up algorithm is a local set of software integrated into the CPU - the name of the device, such as the name "Alexa", the activation word needs training, you need to collect at least more than 100 different sounds of pronunciation, so custom activation of the word will production fees. The second is that the noise reduction algorithm and the activation word algorithm all run in the CPU. Therefore, the digital MIC is required to directly connect the digital voice signal to the CPU. Currently, there are not many chips that can directly connect to a digital MIC. For example, G102 can directly connect 8 digital MICs, but it needs an algorithm to directly switch the 8 digital MICs.
Third, cloud services
Cloud services are the focus of Echo, including artificial intelligence + various services, artificial intelligence is responsible for semantic understanding and content search and answer. For example, if you ask Alexa what the weather is like today, he will first upload your spoken words and then upload the voice audio files after the local processing to the cloud server. The server will translate the voice information into text, and then find out the keywords of the text and Through the training of big data, you can understand the abstract meaning of the text, and then help you find the corresponding answer. The answer must also call the database of weather information (this database must also support voice search retrieving), and finally feedback. To the audio, sound broadcast: "Today the weather turns sunny, you need to bring an umbrella." Then this series of actions is artificial intelligence, weather information database is the cloud content. Amazon has integrated at least 5,000 cloud content, including weather, music, taxis, ticket bookings, takeaways, control of appliances that support the IFTTT protocol, etc., and it is still increasing every day. The more people use artificial intelligence, the higher the accuracy of recognition by Alexa, and the richer the content is, the more powerful Echo will be, and people will become more and more involved.

Fourth, APP
Echo also has an APP that can be installed on the mobile phone side and is responsible for connecting Echo to the router network (because Echo has no screen and cannot directly enter the wifi password), and can also view the history information used by the user and educate the user how to use Echo more Features.
V. Production
Amazon does not directly produce Echo. Instead, it is designed and sent to OEMs. Amazon takes charge of quality acceptance and starts selling. The main sales channel is Amazon.com, and offline channels are also starting to sell.
At present, all company's products need to be certified by Amazon before they can start selling. Not only do they have to go through Amazon's more than 100 test items and sign NDAs, but they also need to answer questions such as sales channels and sales forecasts. How much, and so on.
In addition, there are MIC debugging, WIFI testing, software stability testing. The most complicated of these three test items is the stability test. The biggest impact of MIC debugging is the distance of effective voice control. The test equipment is more expensive. It requires a professional MIC factory to debug, and WIFI technology is well debugged. The distribution network is successful. High rates also require great effort. For example, our company's wifi function can guarantee 100% connection. In 2015, I asked WeChat to have a micro-connection function. They said that the success rate of their network configuration was less than 60%.
Sixth, the cost
Several major costs: R&D personnel expenses, hardware costs, license fees for voice recognition algorithms, cloud server rental fees (Alibaba Cloud can be used in China, and Amazon AWS can be used abroad), content integration authorization fees (such as Baidu Music is 5 cents each time.)
VII. Ecology
Before starting to do ecology, everyone needs to clarify the difference between Alexa and Echo: Echo is just a successful product with Alexa application, and there will be many products supporting Alexa in the future, such as gateways, OTT boxes, car machines, and mobile phones. Any device that supports Alexa functionality will be connected to Alexa's cloud service.
The three markets that are currently relatively hot, I believe Alexa has the possibility of getting involved: smart homes, smart cars, smart phones, and smart wear (headphones, watches, etc.) in the future. So, Alexa will become the brain of an Internet of Things. All kinds of devices are his tentacles. Alexa makes it more and more powerful through the tentacles and gets information continuously. Integrating more and more algorithms, more and more content makes Alexa more and more. Become a service manager for family, car, and work.
So Alexa has two important attributes: Artificial Intelligence and Internet of Things. In-depth learning to get more Internet services is artificial intelligence, which can control household appliances (such as lights, doors) to belong to the household IoT.
Now briefly talk about the Internet of Things, Echo supports WIFI, the communication protocol between the devices of the Internet of Things in the United States has preliminarily established a standard - IFTTT, as long as they support the IFTTT protocol and come over Amazon certified equipment (lights, door locks, etc.) You can use voice to control the lights off the living room, open the curtains and other movements. Echo can be parsed into commands via the cloud and fed back to the home router. The router then broadcasts this control command. The light in the living room corresponds to the IP number of its own MAC. After receiving this command, it is paired with its own IP number. If it is in the living room, This instruction will be executed and closed. If it is a bedroom light, this instruction will not be executed.
The cost to achieve these requirements is to install an IoT wifi module in each lamp. The IoT wifi chip inside this wifi module will come with an MCU running wifi protocol stack, and can receive a 1MB bandwidth wifi signal. Therefore, the concept of smart home is no longer using mobile phones, but directly communicate with the audio. In addition, you can also add a sensor chip, plus a timer, through the Alexa in the cloud to learn the habits of each consumer after self-determination - when the home lights turn on naturally, the background music rings the music you recently focused on, in the morning The curtains are automatically turned on, the music sounds, and the schedule of your day is broadcast today, weather conditions. Here is a brief introduction of the IoT module (MT7681, RTL8711AF, RTL8711AM, ESP8266, XR871, etc.)
What are the main considerations for developing Echo products?
1. Mature hardware and software: Due to the need of linux system development, all of them need to develop their own system, so the development is very difficult. Generally, it takes at least one or two years to develop, and the system stability developed is also worth considering.
2, cost: After all, Amazon's product price is only 179 US dollars, then your ex-factory price can not be higher than 60 US dollars or through the channel to the sales market price will be more than 179 US dollars, there will be no competitive advantage.
3ã€Amazon's certification: There are many products lined up in Amazon's hands to certify, and even there is no lack of big brands of products, there is no good relationship, no clear sales ideas and data is difficult to impress Amazon's test supervisor to test your product first, long The period of time may be 3 months or even shorter and it takes 1-2 months.
4, the sales market: Amazon currently only supports English, the next step will support German, it can only be sold in the United States and the United Kingdom, other countries cloud services Amazon is still under construction, there are many factors of variation, so can not be like Bluetooth speakers Like everywhere in the world.

How to develop a domestic Echo sound?
Optional voice recognition technology solution provider: voice recognition algorithm company has science and technology news, think of Chi Chi, Cloud Sound, Beijing Sound Intelligence Technology, etc.;
APP needs its own development, cloud content needs its own integration (domestic content is QQ music, cool dog cool my music, Himalayan radio, lazy people listen to books, etc.; news is Netease news, today information, weather forecast has public weather platform, intelligence The answer to the database is Baidu think tank, more answers to the question you can set up a team to constantly sort out the answers to new questions, a taxi to take a taxi, etc.), the server needs to develop their own, in-depth study of science and technology news, think of Chi Chi. The algorithm can be found on its own, or it can be directly used by the HKUST to fly, think of Chi Chi, but it needs to pay.
So much content needs to be integrated, so much content becomes impossible, and no single function can do a complete set of cloud services, artificial intelligence deep learning, smart hardware, APP so many things. What's more, there are brands and sales after the product comes out.
Amazon has invested more than 2,000 people. It took only four years to get today's results. HKUST spent 150 million yuan to make a humming sound, and it only did part of the work when it came to the hands of the last consumer. The user experience is also very general, after all, only listening to music, simple question and answer, no other features.
A hardware company that needs to develop a Chinese echo becomes extremely costly. Needs R&D investment, voice recognition payment, content integration, and server bandwidth costs. So, at best, only Internet companies like Tencent and Alibaba that have data and content and money can do it. For example, Alibaba's Ali Xiaozhi is a smart home cloud platform and has been made into a big app by Ali. It is compatible with various hardware manufacturers' equipment. WIFI audio is just one of the categories. This is actually not good, is a harm to the majority of smart audio brand factory, because whether it is 2000RMB or 200RMB is a look, the function is the same, is not conducive to the healthy development of the market. Moreover, Alibaba has interfered too much with the market. The first criterion for certification is to have a Tmall flagship store, and then rely solely on the cost of the product to determine the sales volume of the product and give different support. They only care about your product can give How much usership they bring, less emphasis on user experience. This is the opportunity of other Internet companies to do some kind of product cloud platform first, and to allow differentiation between the same category of products, and finally to achieve a kind of implicit interoperability between these types of products.
Echo's future development
According to Amazon's big data analysis of Echo users, currently the most used Echo users are setting an alarm clock by voice and asking for weather conditions. The function that the consumer desperately needs is to be able to talk, so now Amazon is seizing the time to develop Echo to support the conversation function.
At the same time, smart home IoT is currently a piece of Echo's user experience is not good, if the home wifi once off the network is completely useless, so Echo is using Intel's chip to develop a local speech recognition + network speech recognition function, Hundreds of simple home control commands can be completed through the local speech recognition LAN, and no extranet is needed.

Lenovo Chromebook 500E Yoga Gen4
Lenovo Chromebook 500E Yoga Gen4,Lenovo 500e Lcd Back Cover,Lenovo Chromebook Touchpad,Lenovo Chromebook 500e Parts
S-yuan Electronic Technology Limited , https://www.syuanelectronic.com