Youtube Algorithm Linking Performances & Exposés
Part of a series on what I’m calling The War On Magic.
This all started with a click.
Someone sent me a link to a Youtube video. I clicked it and watched a television interview that included the performance of a magic trick. I watched the entire video and as it is ending Youtube gives you a preview of what the next video on auto-play is with a countdown:
I see playing cards in the thumbnail…
Must be another magic video…
No, wait, it’s a magic exposé video…
It’s the same trick that I just watched performed…
How is this happening? I must know…
I immediately started looking into the video itself.
The title was innocent enough. No mention of specific magic tricks or trick creators.
The description was innocent enough. Just a little description about the performer and that he performs magic.
The keywords were innocent enough. I right clicked the page and clicked “View Page Source” which shows you all the hidden coding on the page including keywords and metadata on the actual page itself. It mentioned some famous magicians who had nothing to do with the video (David Blaine, David Copperfield, etc), no doubt a method to try and increase traffic but still no mention of the piece of magic’s name or the creator of the magic effect.
The comments were innocent enough. No magic exposure in the comments or mention of techniques or names of effects.
The linking of the performance videos to their exposé videos was happening inside of Youtube.
I took all my knowledge of algorithms, artificial intelligence, and machine learning and began investigating Youtube’s constantly evolving internal systems to try to understand how all of this was happening.
These are my findings.
Marketing companies spend money to run advertisements on Youtube.
Youtube makes money from both views of the ads & clicks of the ads.
The more videos a person watches the more advertisements they are exposed to.
Youtube calls the time a person spends on their platform a “session”.
The longer a user’s session is, the more money Youtube makes from the marketing companies.
Youtube develops a constantly evolving algorithm that is designed to prolong your session.
THE GOAL IS TO KEEP YOU ON YOUTUBE
This algorithm is making decisions, based on many factors, to suggest videos that you, as a user might be interested in.
These videos will appear on the sidebar as a suggested video as well as the homepage.
These suggestions are based on what the system has recognized as having potential for you to click on.
Youtube is able to make these determinations based on metadata, internal data that is being collected by the platform as well as data collected by Google through it’s other projects (Google Mail, Chrome, Maps, etc).
Youtube used to only get metadata from title of a video, the uploader, the written description of the video, and the comments.
Times have changed though and the technology is evolving.
Youtube now automatically runs voice recognition software to create a transcript from the video. The system is able to recognize speech and the transcript is openly used in their subtitles/closed captioning system.
You can see the whole transcript by clicking on ••• at the bottom right of a video and then clicking on “Open Transcript”.
Internally, the system is able to create more metadata based on sounds outside of speech such as recognizing the sound of rain or a car honking.
This information is also used to find content that the machine predicts you will interact with.
For example, let’s take my performance of Oil & Water on Penn & Teller’s Fool Us. The transcript contains about 1300 words. The words are ranked internally based on how common they are and their order. Certain words like “interlace”, “cards”, “red”, “black”, “sleight of hand” stand out because they are less common than “hello”, “welcome”, “take a seat”, and “thank you”. Also they become important to the machine in terms of how they group together as those groupings or similar groupings appear in other videos.
Youtube will recognize certain keywords and language patterns used in my speech in other videos. These other videos are often people performing the same effect or exposing the effect. There are people sitting at home in front of their cameras exposing real magic techniques using similar language that I am using in my performance.
The algorithms understand these patterns and suggest videos revealing secrets to the magic trick you are watching in a performance video.
This metadata is also translated into every language and interfacing internally with every other piece of metadata in the system.
Which means you could have someone performing magic in Spanish and the algorithm could connect it with an explanation or exposé in English.
Google’s use of Inception also is the tip of the spear in terms of image-recognition neural networks. The technology can identify the contents of an image and is improving exponentially. Youtube is able to not only decipher the metadata in a video’s thumbnail, it is able to process every single frame of a video as an image and cross-examine it in their internal systems. This technology has allowed them to keep their platform friendly to advertisers by automatically detecting nudity, extreme violence, and other types of content that is prohibited by their Terms of Service.
As the technology advances the machine will not only be able to recognize that a person in a video is performing a piece of card magic. It will look at the layout of the cards, such as the classic T formation used in many Ace Assemblies, to further identify the content of the video.
This information will be used to suggest videos which expose the secrets of the magic presented in the performance video.
One of the most common questions magicians hear is,
“How did you do that?”
The audience doesn’t really want to know. This is just an instinctual reaction to witnessing something unknown.
With Youtube people ask the machine the same question in a different way. They may not know the name of the magic effect but they can describe it in the “Search”.
Someone could see me on Penn & Teller and then type in “Card magic trick three red three black revealed tutorial” and many videos will pop up of people exposing various methods of the classic Oil & Water plot.
Let’s say 1000 people watch my performance video. Then 50 of them search for the secret of the magic. They click on what they think is an explanation of the magic effect. Youtube sees this as a pattern of behavior which prolongs the users sessions. It will start linking those two videos together.
With this technology someone could potentially have a completely silent act, with no audio metadata, and have their performance videos linked to explanation videos because of the pattern of behavior the users exhibit on the platform.
This machine learning would take place even though there is no relevant metadata. Let’s say you ran an experiment.
You have 1000 people view a specific 5 minute video of a tree. You then have them immediately search for and watch a specific 2 minute video of the moon. Those are totally unrelated videos with unrelated metadata but despite that the algorithm would understand that somehow they are related and extend a user’s Youtube session so it would internally link those two videos together.
With magic, even though only a small percentage will search for the secret of a magic trick on Youtube after watching a performance it will effect the viewing experience of all users as exposés will be suggested to them.