Need help finding the right AI/Machine Learning Computer Vision library for my task

Hi, i want to make a program that uses streamlink to pipe frames from a stream into an AI, and generate statistics based on it. I’m starting with Mario Kart 8 Deluxe and my favorite streamer, whom will do a weekly game where he plays Mario Kart 8 and the mods do random bets.

I need an AI that can

  • Read text on screen and write it to a spreadsheet/txt file when it updates and what time it did - Read the various items you can collect, and write that as-well

I dont know what to use for this though. All the screen elements are static and not moving…

hopefully i explained this well.

The way AI and various other pre-existing architecture around this work is to do some pre-processing and them software magic with algorithmic stochastic, fancy math.

Pre-processing is the key in what you want to achieve to reduce the load per picture that your system will get.
You mention all elements are static and not moving, giving perfect opportunity to do:

  • have streamlink cut it into images (example github issue talking about this)
  • (maybe) do some color → greyscale convert (eg. imagemagic: convert <img_in> -set colorspace Gray -separate -average <img_out>
  • identify where/when they are on an image (software 1 - small code snippet in any language of your choice checking when certain pixel cluster of text color appears and/or changes on screen - might need to write this fitting your own need)
  • cut the picture into pieces (software 2 - maybe before earlier point, depending on how static the text is)
  • process the text (actual AI or software using statistics)

Have a look at (query search engines of your choice for): OCR
OCR being “optical character recognition” aka. the snippet type of code you are looking for to convert text in an image to computer readable characters.

Good luck in puzzling together the pieces into 1 software. I recommend doing one step at a time and combing them once you feel the individual commands do what you want them to.

Yea that was the plan. The coin count and place marker is the easiset because its just text, then another ai to do the items, because those are complex images, but are still static.

It would make my life a LOT easier if I could find something that could listen for an audio cue and then trigger different things?