But the same holds for regression, which you seem to favour. So why do you feel that regression is so much better than classification (which is, when combined with a confidence score, basically regression)?
No matter how you extract the code, this will be hacky. The problem with this approach is that you are entirely dependent on the YouTube backend. They will not notify you when they change their code/API. They will not comment their code.
In the past, this has led to a considerable development investment of projects like NewPipe where they have to fix somethong every few months as there is a backend change.
I am still thinking about your problem, but I am unsure whether the approach of extracting from JS works (mid term).
Why not do the steps you outlined above as a macro on your keyboard? This eliminates the need for JS. To extract the video URL, you could use some RegEx automatically or Ctrl+F. Just some thoughts. I am still invested into this weird request :)
To add on to that: Obsidian is the only program that currently has this (to my knowledge) and it is a huge gamechanger. It just feels so much more usable than anything with a source and preview view. [I'm not demanding or anything, this is the stuff you do in your freetime, but you might want to go down that road.]
I have the same setup with SE working. Why does it fail to run on your system? Did you install protontricks like it said in the MO2 installer readme?