Screenshots to Code

An experiment to (re)learn Machine Learning

What am I trying to accomplish?

This blog post has a handful of goals:

  • Get up to speed on Machine Learning
  • Help others do the same
  • Get writing again
  • (stretch) Provide a tool that helps developers move code forward

While this last bullet feels ambitious, it will act as the framing to make sure we explore and learn things of real use (not just theoretical).  The project we’ll tackle is to create a tool that takes a screen shot of UI and generates the code to create that UI.  This blog post will serve as the directory of the explorations that go into tackling this project.  It’s broken up in a few sections:

  • Project Structure – A description of the various problems and sub-projects that need to be completed.  This will be updated as progress is made as well as when there are set-backs and their are lessons to be learned.
  • Resources / Topics
    • Topics – As I explore certain topics that feel a gnarly to those of us who aren’t data-scientists, I will write-up a guide as well as links to resources to get started.
    • Resources – For topics that are already well covered and manageable, I will simply include a brief description and a set of links.
    • Up-coming dives – I fully expect to hit a number of topics that require exploration.  This will be a stack of topics that are on the backlog to explore.

Project Structure

Having not written any AI/ML algorithms in over a decade, my first stab at project structure is likely naïve.  Current thinking is that it is composed of the following problems/sub-projects:

(Neural Network) Screenshot to GUI Code

  • Description: Input a screenshot and output a semantically correct snippet of GUI code that creates it.
  • Input: A set of screenshots
  • Components:
    • Primary Recurrent Neural Network – this will take the input screenshot and output code.
    • (sub-project) GUI Code to Bitmap – takes the output of the RNN and creates a bitmap.
    • (sub-project) Bitmap Comparer – take two bitmaps and returns a float for how similar/different they are.
  • Training loop: 
    • Run RNN on sample, take its output and run the GUI Code to Screenshot, score the result


GUI Code to Bitmap

  • Description: Create a component that can take semantically correct GUI code and generate the expected visual result..
  • Components:
    • Data Set: (sub-project) Collect Code Samples via GitHub Crawler
      • Description: Crawl GitHub (taking license into account) and get UI/XAML files, and generate UI snapshot
      • Goal: Create the training set.
    • Code to Bitmap: Current thinking is to use Windows.UI.Xaml.Markup.XamlReader to parse the file and RenderTargetBitmap to create the image.  Note, due to performance reasons, my guess is that we will need to make a Generative NN to replace these two calls.

Bitmap Comparer

  • Description: Take in two bitmaps and calculate their similarity.
  • Challenge: No idea yet on how to create a good cost function.  Adding it to the list of coming soon topics…

Resources / Topics

If you have any suggestions on content that you hope I will explore, pointers to great resources / material, or just comments and questions; Please don’t hesitate to leave a note in the comments below.

Subjects to explore:

  • How to create a good cost function for comparing two images?


<Coming Soon(ish)/>


<Coming Soon(ish)/>