Tech

DotData extracts key data features to make machine learning useful

Elevate your enterprise data technology and strategy at Transform 2021.


Many artificial intelligence experts say that running the AI algorithm is only part of the job. Preparing the data and cleaning it is a start, but the real challenge is to figure out what to study and where to look for the answer. Is it hidden in the transaction ledger? Or maybe in the color pattern? Finding the right features for the AI algorithm to examine often requires a deep knowledge of the business itself in order for the AI algorithms to be guided to look in the right place.

DotData wants to automate that work. The company wants to help the enterprises flag the best features for AI processing, and to find the best place to look for such features. The company has launched DotData Py Lite, a containerized version of their machine learning toolkit that allows users to quickly build proofs of concept (POCs). Data owners in search of answers can either download the toolkit and run it locally or run it in DotData’s cloud service.

VentureBeat sat down with DotData founder and CEO Ryohei Fujimaki to discuss the new product and its role in the company’s broader approach to simplifying AI workloads for anyone with more data than time.

VentureBeat: Do you think of your tool more as a database or an AI engine?

Ryohei Fujimaki: Our tool is more of an AI engine but it is [tightly integrated with] the data. There are three major data stages in many companies. First, there’s the data lake, which is mainly raw data. Then there’s the data warehouse stage, which is somewhat cleansed and architected. It’s in good shape, but it’s not yet easily consumable. Then there’s the data mart, which is a purpose-oriented, purpose-specific set of data tables. It’s easily consumed by a business intelligence or machine learning algorithm.

We start working with data in between the data lake and the data warehouse stage. [Then we prepare it] for machine learning algorithms. Our really core competence, our core capability, is to automate this process.

VentureBeat: The process of finding the right bits of data in a vast sea?

Fujimaki: We think of it as “feature engineering,” which is starting from the raw data, somewhere between the data lake and data warehouse stage, doing a lot of data cleansing and feeding a machine learning algorithm.

VentureBeat: Machine learning helps find the important features?

Fujimaki: Yes. Feature engineering is basically tuning a machine learning problem based on domain expertise.

VentureBeat: How well does it work?

Fujimaki: One of our best customer case studies comes from a subscription management business. There the company is using their platform to manage the customers. The problem is there are a lot of declined or delayed transactions. It is almost a 300 million dollar problem for them.

Before DotData, they manually crafted the 112 queries to build a features set based on the 14 original columns from one table. Their accuracy was about 75%. But we took seven tables from their data set and discovered 122,000 feature patterns. The accuracy jumped to over 90%.

VentureBeat: So, the manually discovered features were good, but your machine learning found a thousand times more features and the accuracy jumped?

Fujimaki: Yes. This accuracy is just a technical improvement. In the end they could avoid almost 35% of bad transactions. That’s almost $100 million.

We went from 14 different columns in one table to searching almost 300 columns in seven tables. Our platform is going to identify which feature patterns are more promising and more significant, and using our important features they could improve accuracy, very substantially.

VentureBeat: So what sort of features does it discover?

Fujimaki: Let’s look at another case study of product demand forecasting. The features discovered are very, very simple. Machine learning is using temporal aggregation from transaction tables, such as sales, over the last 14 days. Obviously, this is something that could affect the next week’s product demand. For sales or household items, the machine learning algorithm was finding a 28-day window was the best predictor.

VentureBeat: Is it just a single window?

Fujimaki: Our engine can automatically detect specific sales trend patterns for a household item. This is called a partial or annual periodic pattern. The algorithm will detect annual periodic patterns that are particularly important for a seasonal event effect like Christmas or Thanksgiving. In this use case, there is a lot of payment history, a very appealing history.

VentureBeat: Is it hard to find good data?

Fujimaki: There’s often plenty of it, but it’s not always good. Some manufacturing customers are studying their supply chains. I like this case study from a manufacturing company. They are analyzing sensor data using DotData, and there’s a lot of it. They want to detect some failure patterns, or try to maximize the yield from the manufacturing process. We are supporting them by deploying our stream prediction engine to the [internet of things] sensors in the factory.

VentureBeat: Your tool saves the human from searching and trying to imagine all of these combinations. It must make it easier to do data science.

Fujimaki: Traditionally, this type of feature engineering required a lot of data engineering skill, because the data is very large and there are so many combinations.

Most of our users are not data scientists today. There are a couple of profiles. One is like a [business intelligence] type of user. Like a visualization expert who is building a dashboard for descriptive analysis and wants to step up to doing predictive analysis.

Another one is a data engineer or system engineer who is familiar with this kind of data model concept. System engineers can easily understand and use our tool to do machine learning and AI. There’s some increasing interest from data scientists themselves, but our main product is mainly useful for those types of people.

VentureBeat: You’re automating the process of discovery?

Fujimaki: Basically our customers are very, very surprised when we showed we are automating this feature extraction. This is the most complex, lengthy part. Usually people have said that this is impossible to automate because it requires a lot of domain knowledge. But we can automate this part. We can automate the process before machine learning to manipulate the data.

VentureBeat: So it’s not just the stage of finding the best features, but the work that comes before that. The work of identifying the features themselves.

Fujimaki: Yes! We’re using AI to generate the AI input. There are a lot of players who can automate the final machine learning. Most of our customers chose DotData because we can automate the part of finding the features first. This part is kind of our secret sauce, and we are very proud of it.

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact. Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member

Leave a Reply