How to Create an Audio Recognition System Using AcrCloud C# SDK
with yubzy|4 years experience
Intro - How to Create an Audio Recognition System Using AcrCloud c# SDK
Jul 4, 2019 | 2:08 PM2:40
Session 1: About Automatic Content Recognition(ACR)
Jul 4, 2019 | 2:09 PM16:2
Session 2: Creating an Audio recognition system using Acrcloud
Jul 4, 2019 | 2:10 PM16:16
Session 3: Working with the Audio Recognition c# sdk
Jul 4, 2019 | 2:12 PM32:24
Session 4: Conclusion
Jul 4, 2019 | 2:13 PM8:43
How to Create an Audio Recognition System Using AcrCloud C# SDK
- Project length: 1h 16m
Through this series, I will cover the details of interacting with the Acrcloud C# SDK for creating an audio recognition system.
This project covers the details necessary to create an audio recognition system using the Acrcloud c# SDK. this project is for expert programmers that want to create intelligent applications based on media inputs instead of text-based search.
What are the requirements?
- Expert Knowledge in c#
- Expert Knowledge in Visual Studio
- Expert Knowledge on Algorithm and Cryptography
What is the target audience?
- C# expert developers ready to create audio recognition systems
The project outline explains what you will learn in each session
Here we discuss: 1. What is ACR 2. How ACR Works 3. ACR Algorithms 4. ACR Applications 5. ACRcloud
Preparing and Setting up your AcrCloud dashboard
Detecting audio content by sound
A recap of what we have learned so far.
00:00:00-00:05:00 Hi! Welcome to session 1 on this video series on creating an audio recognition system using the SCR cloud C shop SDK. My name one more time is Ayuba and here we're going to... In this session 1, we're going to talk about the automatic content recognition of course which is the ACR and we're going to discuss what ECR means we're developing from the SDK. So, that's what we're going to discuss in this session line. So, first of all, we're going to start with what is an ACR. So, as I wrote an automatic content recognition (ACR) is an identification technology to recognize Chrome... detailed information about the contents they have just experienced without any text inputs or such efforts. So, just think about... It's like you know just if you can think of like having an app you can just humor song and then from what you've just human into that app, it can search through the app's database and bring onto the contents of the song you just assumed. That's just an example of how easier technology works. So, basically this is you such a lot of first 10 seconds of the media content and then the information of that media content is displayed to you. So, that's what automatic content recognition is all about. So, SEO can help users deal with multimedia more effectively and make applications, more intelligent. So we're heading to a stage where we don't really need to include text before we are able to get information about the media file. So, we can just get information about the media file by a sample of the media content itself. So, now how does the ECR warks? You can see this image depicts how the easier or technology works... two files and then is this Misco files extracted using a certain kind of fingerprint algorithm and then to create another database of another very large database of fingerprints. So, each of the musical file or the audio file it's a fingerprint is generated vai a kind of fingerprint algorithm and then the fingerprint it's created into... ...you can even check it out on a Google Play Store on iOS. You can see it's they use the issue of technology, so you can check it. So now it's... For example, let's look at here it's like a user records X sounds of current music and then this what you just recorded is picked and then through the shells in front of servers, for instance, you ask for the Associated music with the fingerprints. So, the shazam frontal server now checks if the fingerprints exist from this very large database of fingerprints of millions and millions of musical database that has been inserted here. So, if the fingerprints exist, it returned associated music. So, that's how the ECR technology works so instead of imputing text here you actually record with certain seconds of the current music to find the musical file. So, the fingerprint our algorithm... So, fingerprints are very discriminative, so the system can use them to identify the hoe do they belong to. Fingerprints are also reports, which means they can resist environmental noise. So, if you're using this kind of app like a shuttle, for instance, it resists environment noise and this makes it possible to identify recorded would use in a rather noisy environment. So, it's it doesn't... Romans noise and then if the system finds matched fingerprints just like I explained here of Aquarion snippets it can determine the most liked audio in the database and give the position of the snippet in the source audio. The mission of Asia is complete when the system finds the match. So, basically, this is how the easier technology works. 00:05:00-00:10:00 So, now that we've seen how the easier technology work, we are going to look at the easier algorithm itself. Now, there are several kinds of ECL algorithm automatic content recognition algorithms that out there but the most significant of them, the most important of them is the audio fingerprinting. We have all your fingerprints, we have audio what's a marking, we have the image what's a merkin, but since we are creating music and audio recognition system using the ECR close-ish of SDK. We're going to talk about the audio fingerprinting which of course it's the most effective as it is today that has been tested and is most effective. So the audio of fingerprinting and what do you think of print let me just read on our roots here. An audio fingerprints that contain digital summary deterministically generated from an audio signal that can be used to identify an audio sample or quickly look at similar items in an audio database. So, it's just like we have the database of fingerprints like I said to put your fingerprints. For instance, we have them so it's a condensed digital some, where the audio fingerprinting algorithm contains a digital summary. So if you look at this picture each of these black... this what's the algorithm, which generates and it's it's generator kind of fingerprints, you know, that's will be stored in that database. So... the effective and widely applicable algorithm of easier. Now, for what I'm like in the lack of automation is that it requires inserting inaudible or invisible digital tags, contain information about the contents into the contents... to broadcast channel programs ID and time stuff and this is not realistic and flexible for most applications. So, the audio fingerprinting is actually the most realistic and flexible for most applications. So, like I said the black dots, you know, like the audio fingerprints if you just look at this intuitive picture representation of it. So, and how the audio fingerprints... The workflow does all your fingerprints it's something like this. You know, for example, you can see this is unknown audio, for instance, using the audio fingerprinting extraction is taking and then such as the database. If it finds them using the algorithm that we have that's included with the c-sharp SDK, we're going to search through the database and then if you find out much in fingerprints, then we display the metadata, which is going to be in JSON format. So, that is that so grants you're able to search through this database and print out the information in JSON format using what we are going to learn in this video tutorial and you can use it to create amazing applications. You can create mobile applications like Shazam. You can create video recognition applications yourself, maybe mobile video commercial applications yourself, where you just have to play small X contents, maybe just a small seconds of the video and then the video information will be completely displayed. So, that's the workflow of the audio or fingerprinting and you know if two files sound alike to the human ear they are all your fingerprints so much, even if the binary representations are quite different. All your fingerprints are not bitwise fingerprints, which must be sensitive to any small changes in the data. Just like I said the audio fingerprint it's not sensitive to noise and it's not like a bit-wise fingerprint, where any small changes will change the entire data, no. The audio fingerprints it's more like, you know, it's more like contains human fingerprints. Let me use that word it's not really like you know a bitwise finger... and incised wounds as I said in my songs melodies and tunes. You can use it even in copyrights compliances, licenses and automaticity station skin. So, you can present a kind of application, you know, that you can use the about your fingerprints texting this technology. 00:10:00-00:13:46 And if I can even go further you go you can mix this technology with a blockchain, where you can store all... long text we're just going to be like a test, not the audio file itself. You can store the text of that condensed fingerprints into the blockchain and it can be global. So you can think of an app like that, but for now, we're just going to use the c-sharp SDK of this to look at how we can get audio information from unknown audio that that would be played. So, that is about the workflow of the audio fingerprints and then the s information about the contents of the watch. For example, you can like nowadays you see Smart TVs that if you just watched a few of first few seconds of a particular TV program you get to see the information about that program or you get to play the first few seconds of music, or a sound you get to see the information about that music, the artists, the album, and all that. So, it's applicable in content identification and of course also applicable in broadcasts money train. You know, we are broadcasters can monitor information of contents played on TV and radio, like time of play, duration, frequency, et Cie. And it's also applicable in content enhancements. Additional information can be presented in the user since you can use that two-screen, easier can enable a variety of interactive features such as polls, Capone's, lottery or purchase of goods based on time... ...when you apply easier technology into smart TVs, mobile devices it easy to quantify audience consumption. So, thus these are just an example of applications of ECR technology. So, now, that we have known what a CR technology is and we've seen its application and we've seen that the algorithms that are being used for it. Let's talk about an easier cloud, which is the company let's talk about. An easier cloud is the first open cloud platform of automatic content recognition services based on audio fingerprinting technology. Interactions copyright detection and et Cie and the workflow of the easier code is like this. You have contents and then this is a media contents like music, videos and rest sounds and then you know you ingest this content into pockets, this pocket is like... ...what cancel to do this. So we're going to create projects from these pockets and then interact with SDKs alarm... Terkel, it's actually an awesome technology, awesome company and then we're going to look at how we can use the SDK to be able to receive all your inputs and then query a database and bring out information about the audio that's we have played. So, that is that about ACR technology, automatic content recognition and the easier cloud company itself. So, next, we're going to look at the Christian audio recognition system using the ECR cloud. So, thank you very much!
Invite a Friend
Education Ecosystem referral program offers you a great opportunity to earn additional LEDU by inviting Friends!Invite a Friend