Meet SOTAI: Making Machine Learning Transparent And Accessible

At SOTAI, we're excited to introduce our Interpretable Machine Learning SDK designed to make machine learning transparent, interpretable, and accessible for everyone, from data scientists to product managers. Our mission is to enable professionals across every industry to harness the power of AI without being held back by the complexity and opacity of traditional black-box models, like Deep Neural Networks (DNNs). With SOTAI, you can directly embed domain knowledge into your models to gain greater control and instill a level of trust in your models that black-box techniques can't provide.

During my time at Google AI, I researched state-of-the-art interpretable machine learning systems (calibrated modeling) and helped product teams implement this research in their products. Many teams wanted to use more advanced machine learning techniques but were unwilling to use traditional models due to their opacity. They were interested in using calibrated models that offer the predictive power of black-box models without the black box; however, the process of implementing these models was inefficient, involving constant back and forth between the product team (who understood the data and use-case) and our research team (who understood calibrated modeling). SOTAI was founded to make calibrated modeling more accessible and easier to use through our platform so that individuals and product teams can directly capitalize on their domain expertise.

In this blog post, we'll first discuss the pitfalls of black-box modeling and how calibrated modeling can solve these problems. Then we'll explore how SOTAI can benefit various industries with real-world examples showcasing its value.

Why Should You Care About Interpretable Machine Learning?

A basic DNN with three hidden layers each with three hidden nodes. Note that they are called "hidden layers" for a reason...

Black-box modeling techniques present several challenges that can hinder the effective adoption and implementation of AI solutions in any industry. First and foremost, the lack of transparency in black-box models makes it difficult to understand the reasoning behind their predictions, which can lead to mistrust and reluctance to deploy these models in critical decision-making processes. Consider predicting the price of a house using square footage as one of the input features. You use a black-box model to make this prediction, but soon after you realize the value for square footage is incorrect. You increase the value by 200 sqft to the correct value, but now the model is predicting a lower price. This makes no sense. Increasing the size of the property should have increased the price. Now you're worried about the impact of changing any feature, particularly those of high importance. You've lost trust in your model.

What can you do? Nothing really. Unexpected behavior is endemic to projects with imperfect data, many variables, and black-box models such as DNNs. Your best option is to collect more training data and hope your model learns the behavior you expect.

Black-box models can also inadvertently perpetuate biases present in the training data, potentially resulting in unfair or even harmful outcomes. Without a clear understanding of how the model is making its decisions, it is challenging to identify and address these biases. Additionally, black-box models are often sensitive to changes in input data and may result in unexpected or erratic behavior, which can be especially concerning in high-stakes or regulated environments. Lastly, the inability to incorporate domain knowledge and business constraints in black-box models can limit their applicability and prevent businesses from fully leveraging their expertise to optimize AI-driven outcomes.

What Is Calibrated Modeling?

An input first goes through the calibration layer, then through the linear, lattice, or lattice ensemble layer to produce the final output.

Calibrated modeling is a machine learning technique that primarily solves two problems: understanding why and consistently predicting how a model will behave on unseen examples. The approach to solving these problems lies in calibration analysis and shape constraints.

Calibration analysis provides insights that go beyond identifying the most important feature or partial dependence plots. Every feature first goes through a calibration layer, which we can visualize to understand how the model understands a given feature. Unlike black-box models, the explicit structure of calibrated models makes this granular level of analysis possible, which allows users to gain a deeper understanding of their data and the relationships between variables.

Shape constraints play a crucial role in making calibrated models interpretable by allowing users to impose specific behavioral rules on their machine learning models. These constraints help to reduce – or even eliminate – the impact of noise and inherent biases contained in the data.

Monotonicity constraints ensure that the relationship between an input feature and the output prediction consistently increases or decreases. Let's consider our house price prediction task once more. A monotonic constraint on the square footage feature would guarantee that increasing the size of the property increases the predicted price. This makes sense.

Unimodality constraints create a single peak in the model's output, ensuring that there is only one optimal value for a given input feature. For example, a feature for price used when predicting sales volume may be unimodal since lower prices generally lead to higher sales, but prices that are too low may indicate low quality.

Trust constraints define the relative importance of input features depending on other features. For instance, a trust constraint can ensure that a model predicting product sales relies more on the star rating (1-5) when the number of reviews is higher, which forces the model's predictions to better align with real-world expectations and rules.

Together, these shape constraints help create machine learning models that are both interpretable and trustworthy. The only problem? Libraries for calibrated modeling, such as TensorFlow Lattice and PyTorch Calibrated, have a steep learning curve, can be difficult to use, and don't have analysis tooling to truly capitalize on the benefits of calibrated modeling.

Harness the Power of Calibrated Modeling In Any Industry With SOTAI

Our SDK and supporting analysis tooling is designed for the entire calibrated modeling development cycle. You can prepare your data for machine learning, configure and train your model, and even launch hyperparameter optimization training jobs on our infrastructure to find the best model. You can also gain illuminating insights through purpose-built analysis tooling, which can help guide the iterative process of refining models and incorporating domain knowledge, ultimately leading to more accurate and trustworthy predictions. Users can create models tailored to their specific needs, delivering valuable insights and ensuring compliance where necessary. With SOTAI, businesses in any industry can reap the benefits of machine learning without sacrificing transparency and trust in their decision making process. 

Ready to experience the benefits of SOTAI for yourself? Register for our free Personal plan today and unlock the power of interpretable machine learning for your business!

Real-World Examples

Stay tuned for future posts that dive deeper into specific industries!

Industry: Sales

Industry: E-commerce

Industry: Retail

Industry: Healthcare

Industry: Real Estate

Industry: Finance

Industry: Recruiting

Industry: Hospitality

Industry: Human Resources

Industry: Logistics