Machine learning with Microsoft’s Project Bonsai

By Kenneth C. Cutter Last updated Feb 19, 2022

With machine learning (ML) at the heart of much of modern computing, the interesting question is: how do machines learn? There’s a lot of deep computing in machine learning, producing models that use feedback techniques to improve, and training on massive datasets to build models that can use statistical techniques to infer results. But what happens when you don’t have the data to build a model using these techniques? Or when you don’t have the data science skills available?

Not everything we want to manage with machine learning generates large amounts of big data or has the labeling necessary to make that data useful. In many cases, we may not have the necessary historical data sets. Perhaps we are automating a business process that has never been instrumented or working in a field where human intervention is essential. In other cases, we might be trying to defend a machine learning system against adversarial attacks, finding ways around the poisoned data. This is where machine teaching comes in, guiding machine learning algorithms to a target and working with experts.

Presentation of the Bonsai project

Microsoft has been at the forefront of AI research for some time, and the resulting Cognitive Service APIs are integrated into the Azure platform. It now offers tools to develop and train your own models using big data stored in Azure. However, these traditional machine learning platforms and tools are not Microsoft’s only offering, as its Project Bonsai low-code development tool provides an easy way to use machine learning to drive ML development for industrial AI.

Delivered as part of the Microsoft Autonomous Systems suite, Project Bonsai is a tool for creating and training machine learning models, using a simulator with human intervention to allow experts to create models without the need for programming or machine learning experience. It also serves as a tool to provide explainable AI, as the machine learning phase of the process shows how the underlying ML system made a decision.

Building automatic teaching with simulators

At the heart of Project Bonsai is the concept of training simulation. These implement a real system that you want to control with your machine learning application. So you should build using familiar engineering simulation software, such as MATLAB’s Simulink, or custom code running in a container. If you already use simulators as part of a control system development environment or as a training tool, these can be repurposed for use with Project Bonsai.

Training simulators with a user interface are a useful tool here, as they can capture user input as part of the training process. Simulators should make it very clear when an operation failed, why it failed, and how the failure occurred. This information can be used as input to the training tool, helping to teach the model where errors may occur and allowing it to find signs of error. For example, a simulator used to train a model from the Bonsai Project to control an airport baggage system might show how running conveyors too fast will cause baggage to drop, and running too slow can cause bottlenecks. The system then learns to find an optimum speed for maximum throughput of bags.

There is a close connection between Project Bonsai and control systems, especially those that take advantage of modern control theory to manage systems within a set of limits. To work well with ML models, a simulator must give a good picture of how the simulated object or service responds to inputs and provides appropriate outputs. You must be able to set a specific start state, allowing the simulator and ML model to adapt to changing conditions. Inputs should be quantized so that your ML system can make discrete changes to the simulator, for example, speeding up our simulated baggage system by 1m/s.

Getting the right simulator is probably the most difficult aspect of working with Project Bonsai. You may not need data science skills, but you definitely need simulation skills. It’s a good idea to work with subject matter experts as well as simulation experts to build your simulator and make it as accurate as possible. A simulation that deviates from the real system you intend to handle with ML will result in a poorly trained model.

Train a model in Project Bonsai

Once you have a simulation, you can start teaching your Project Bonsai ML Model in Training Engine. Microsoft calls these models “brains” because they are based on neural networks. There are four modules: an architect, an instructor, a learner and a predictor. The architect uses the training program to choose and optimize a learning algorithm (currently using one of three different options: Distributed Deep Q Network, Proximal Policy Optimization, or Soft Actor Critic).

Once the architect selects a learning model, the instructor walks through the training plan, interactively driving the simulator and responding to learner outputs. Perhaps you can think of the instructor and the learner as a pair, with the learner being where the ML model is trained using the chosen algorithm and using the data from the simulator with the inputs from the instructor. After the learning process is complete, the system will provide a predictor, which is an algorithm trained with an API endpoint that works as an inference engine, rather than training. The predictor outputs can be compared to the learner outputs to test whether the changes improve the model.

Automatic teaching, at least in the Bonsai project, focuses on achieving specific goals. You can think of them as the boundary conditions of a control model. The available objectives are relatively simple, such as setting something to avoid or setting a target to reach as quickly as possible. Other goals include setting maximum or minimum values and keeping the system close to a specific target value. The training engine will work to support as many goals as you have defined in your training program. Goals like these greatly simplify machine learning. There is no need to create complex training algorithms; all that is needed is to define the targets your ML model will need to hit and Project Bonsai takes care of the rest for you.

The result of Project Bonsai is a machine learning model with the endpoints needed to make your code work. The model can be updated over time, adding new goals and refining the training as needed, comparing predicted results with actual operations.

Inkling: a teaching language for machine learning

The curriculum is written in a language called Inkling. This is a domain-specific language which takes named objects from a simulator, connecting sensors and actuators. Inkling uses sensors to obtain states and actuators to drive actions, with what it calls “conceptual nodes” to describe goals. It’s not difficult to learn Inkling, and most subject matter experts should be able to write a simple training module very quickly. More complex patterns can be created by adding more functions to an Inkling app. Microsoft provides a comprehensive Inkling language reference, and it should help you get started writing Project Bonsai training.

Project Bonsai runs on Azure and you will need to budget for its operations. Models and simulators are stored in Azure Container Registry, using containers to run simulations. Logs are managed using Azure Monitor and Azure Storage contains archived simulators. The costs shouldn’t be too high, but it’s worth keeping an eye on and removing unwanted resource groups once you’ve trained your models.

Machine learning offers an alternative approach to ML development that works well with control issues, such as working with industrial equipment. This avoids needing large amounts of data, and by using goals to teach a model, it can be trained by anyone with an understanding of the problem and basic programming skills. It’s not quite a no-code system, as the training needs to be written in Inkling, and you need expert input to write and instrument a simulator to run in Project Bonsai’s training environment. . With a well-designed training program and accurate simulation, you should be able to create what were very complex ML models with surprising speed, taking machine learning from prediction to control.