A technician “goes astray” and made a new species of vector database

[The creator says]

Created in the name of FOSS.


Zilliz, a vector database company, recently announced the completion of a $60 million Series B+ financing, bringing a total of $113 million in financing. This is undoubtedly a very dazzling stroke among open source startups that are longer than local.

And back in time 6 years ago, when Zilliz’s founder Xingjue brought an Oracle background to look for investors and said that he wanted to create a new species instead of following the tradition, it was not like the current situation. More At the time, the investors were puzzled:

“How did you go astray?”

“You have an Oracle background, and it’s a piece of cake to be a traditional database company. I’ll definitely give you money.”

These voices will make Star-Lord begin to examine the essence of what to do, right? How do you need to improve the way you communicate with your investors? But one thing is very clear – to be the first in the world, to do something that does not exist in this world, even if it is not understood in the short term.

Fortunately, in 2017, Star-Lord and his team successfully obtained an angel round of financing and started research on “new species”. Looking back on the company’s development, Star-Lord took the angel round of financing in 2017 as the first milestone; in 2019, the first real product Milvus open source was regarded as the second; the third milestone occurred in the second half of 2020 In the first half of 2021, Zilliz established the next “five-year plan”, that is, making globalization a core development goal.

In the past 6 years of starting a business, the focus of Star-Lord has always been only technological innovation, exploring new directions from user needs, and then realizing it and achieving the ultimate. Factors such as commercialization and capital market recognition have not yet entered the focus of Star-Lord, but because of the pursuit of technology, Zilliz is becoming the next rising star in the basic software market.

“During our early days, we felt like we didYesproduct, but looking back now, what we were doing was technology. “

The establishment of Zilliz came from Star-Lord’s interest in new technologies and its own business transformation plan.

When he was a graduate student at Huazhong University of Science and Technology, Xingjue, who was in Wuhan, China, followed his teacher to participate in research related to “grid computing” technology, which was later considered to be the predecessor of cloud computing. By chance, Star-Lord’s group got in touch with Globus, a grid computing project led by a professor at the University of Chicago. Star-Lord also joined the Globus project, studied and researched the code, and cooperated with Globus to eventually establish Chinese education. Research Grid – One of the largest grid computing projects in China.

This is the first time that Star-Lord has deeply participated in the construction of open source software through the open source collaboration model. “If it wasn’t for open source, it would be hard for me to imagine that one end is in Wuhan, China, and the other end is in the University of Chicago, USA, so far apart. laboratory researchermemberBeing able to connect, which is very foreign to me, has also opened the door to learning about the latest technology around the world. “

After graduating with a master’s degree, Star Lord went to the University of Wisconsin-Madison, one of the top database majors in the United States. In 2009, he graduated with a Ph.D. and entered Oracle in Silicon Valley, and participated in the research and development of Oracle’s cloud database at that time. In 2013, Oracle 12c version was released with the theme of a redesigned database for cloud computing, and Star-Lord is one of the core developers. By 2015, 12c had become stable, and Star Lord had the idea of ​​returning to China to start a business.

On the one hand, at this time, Starlord had technical experience in data processing and distributed projects, and also saw how a large-scale database software should do it, hoping to expand its capabilities in different dimensions. In addition, influenced by the entrepreneurial culture of Silicon Valley, Star-Lord wants to complete the sublimation from an engineer to a technological entrepreneur through technological entrepreneurship.

On the other hand, in Star Lord’s view, data analysis at that time came to a crossroads, and the research on AI models and algorithms was in full swing. Based on AI research, research on various unstructured data, including pictures, videos, natural language, etc., has also reached a new level, and some good results have been achieved. The level of image classification in the field of computer vision also exceeded the average human level for the first time. In the development of this kind of AI research and unstructured data research, there is a faint idea in Star Lord’s mind: in the field of basic software for data processing, there will be new technologies and new opportunities are being born.

With the enthusiasm for new technology, Star-Lord started the entrepreneurial journey, “We choosespan The data processing track in AI applications. In the past 5-10 years, China’s AI applications and unstructured data processing have been at the forefront of the world, leading the world in both the total amount of data and the usage scenarios. “Adhering to the principle of being close to the source of demand, Zilliz’s story set off from China. In 2017, Zilliz received an angel round of financing and officially set off.

After its establishment, Ziliiz’s initial positioning was to move the database to the GPU and build a new generation of OLAP database system based on GPU hardware acceleration, and then hope to increase the performance by 100 times. Star Lord believes that with the emergence of some new technologies such as GPU and heterogeneous computing in the AI ​​era, we can try to combine distributed computing with heterogeneous computing and data processing. Zilliz first made an engine that accelerates various data processing through GPU efficient parallel algorithms, and implemented this idea.

“We didn’t figure out where to use this engine, we just thought this thing was very good, innovative and cool! So we had to make it first.”

Next, takethis engineConstantly communicate with users and receive feedback. Ultimately, Zilliz found that this technology can help users accelerate the analysis and processing of vector data in AI applications, and such user needs are widespread and growing rapidly. In the process of exploring the technology while doing this, Zilliz gradually took the vector database as the core product direction, and persisted to this day.

After 2 years of “cool” technology, Zilliz ushered in a key inflection point in “technological productization” in 2019.

While working as a data analysis and processing engine, Zilliz received feedback from users one after another and saw a strong demand for vector data analysis and processing. Based on this demand, starting from the second half of 2018, Zilliz started the research and development of new projects, and in 2019, the results were open sourced in one go. This result is the vector database – Milvus.

Milvus Architecture diagram

“For us, with our own accumulation, large-scale data processing capabilities, distributed computing capabilities, heterogeneous computing and other capabilities, we have crossed the gap from technology to products and found a clear product direction, that is, vector Database.” But Star-Lord at that time had no idea about the development of Milvus after that: “We have seen user needs and market opportunities, but we don’t know if the products we provide can really meet market needs perfectly?”

Out of this “pain point”,AgainCoupled with the team’s open-engineering culture, Milvus is open source from the day it was released, under the Apache license. “Open source and open source have been the basic principles of our company for a long time. This is a very simple idea of ​​a group of engineers. We hope that good technology can be spread faster, help people in the industry achieve better success, and open up technology through open source. get wider support.”

For the success of this open source product, there is only one criterion: “Whether it can achieve good early user growth.” This result intuitively reflects whether the product really solves the pain points of users, and whether it really creates other developers. value. finally,Milvus In half a year of open source, it has about 60 enterprise users. “I remember from the first user, there were basically weekly good news, initially one user per week, then three or four in the second week, and seven or eight in the third week.”

Milvus User graph

In fact, the value market that Milvus is targeting has basically been a blank in the past.

In Star Lord’s view, the database industry has experienced a huge “differentiation” in the past 20 years. Relational databases, distributed databases, graph databases, document databases, time series databases… In the future, in the AI ​​era, there will be more Multiple database types appear. “For example, when we mention quantum computing now, it is very likely that a quantum computing-oriented database will appear next.” On the whole, this industry is constantly dividing and dividing labor, similar to the automotive industry, in the past 100 years. , developed dozens or hundreds of sub-categories, and the same is true in the database field, and there are two things that remain unchanged:

First, the needs of human beings are constantly increasing. Second, the degree of digital informatization of human society is constantly improving, so people’s demand for data analysis and processing must also be rising, so it will give birth to more data applications. Scenarios, and under each emerging application scenario, there will be some more specific database products with a clearer division of labor.

Then, looking back at Milvus, it does not need to be compared with other types of databases, but focuses on processing unstructured data for AI applications, such as fraud analysis for financial applications, etc., to meet emerging unstructured data processing. Need it.

Not just Milvus, for all projects, Zilliz judges only “requirements”.

In September 2021, Zilliz released Towhee, Milvus’ upstream software.

“We only consider one problem when launching new projects, and that is to solve user needs.” Zilliz will extract from user needs what kind of product should be made to users. Specifically to Towhee, the essence is that although users can solve data analysis problems well in the process of using Millvus, for some small and medium-sized companies, extracting vector data from various unstructured data itself requires a lot of investment. resource.Therefore, in order to solve this part of user needs, Zilliz launched an open source embedding framework Towheecontains rich data processing algorithms and neural network models, which can help users complete the conversion of raw data to vectors.

In terms of evaluating needs, Zilliz’s method is very “plain”, usually actively collecting or receiving needs from the user group, and then summarizing, classifying, and sorting user feedback, screening out high-frequency needs, and summarizing convergent needs. Then put the data results into the R&D community for further discussion, and finally plan the product design and iterative scheduling based on the discussion results.

Regarding the question of how to make money from technology, Star-Lord and his team have long had a standard: to make money through services on the public cloud.

When it is implemented to realize the conversion of commercial value, it becomes very “random”.

After the Milvus product was open sourced in 2019, for a long time, the team was immersed in the goal of “how to build a global technology-leading product”. Looking back at the thinking at the time, Star-Lord was thinking, “If this technology and product can solve the pain points of users, and at the same time, it is the world’s leading technology, then it will definitely create value for the company, so we didn’t think much about it in the past few years. business model, but focus on making the product well.”

For the Milvus team, at the end of 2020, the product began to enter a mature stage. Milvus already has more than 500 enterprise-level users, and the product tends to be stable. They found new core user needs: users want to be able to use on the cloud. For users, cloud services can save the steps of installation and deployment, and call them directly through the API, so the development and maintenance costs will also be reduced.that’s itZilliz started the research and development of public cloud products.

A few days ago, the beta version of Zilliz Cloud was officially launched. Zilliz Cloud is positioned as a fully managed database-as-a-service based on public cloud, and aims to provide a one-stop solution for vector data processing, unstructured data analysis and enterprise AI application development.

Zilliz Cloud Architecture diagram

The essence of Zilliz Cloud is Database as a service, which provides the capabilities of Zilliz vector database to users in a fully managed way on the cloud, eliminating the need for users to deploy and operate. At the same time, it also helps users solve the problems of data security on the cloud, including data compliance, high availability, disaster recovery, etc., and greatly simplifies unstructured data management for various enterprises developing AI applications during their development process. the process of.

Before cloud products, users in the Milvus open source community who wanted to buy services from Zilliz would be rejected, “We decided a long time ago that cloud is the form of our commercial products, and other than that, we will not charge users’ money. .”

In fact, many open source basic software, like Milvus, are taking the road of open source + free for a long time. In Starlord’s view, the first characteristic of basic software is the high technical threshold, so in the research and development stage, it is a slow and meticulous activity, requiring long-term investment, constantly summarizing user needs, and making it a general-purpose product. Another major feature is that once a world-leading technology and product is made, there will be a “winner takes all” pattern. And this is one of the important reasons why Zilliz has long insisted on being technology-led and open source.

At present, Zilliz is following the strategy set more than a year ago and taking the road of globalization. As for the cloud service business that has just started, the North American market will be the focus of attention in the next year or two.

“If we look at the estimation of the global market, in the field of basic software, the US market accounts for about 30%-40% of the global market, and it is the world’s largest single market.” Therefore, the globalchangeAs a target, Zilliz put the main battlefield of internationalization in North America.At present, Zilliz Cloud has supported AWS in July, and will further expand its capabilities to form a multi-cloud solution, including support for Googlecloud computing platform GCP, Microsoft’s Azureto realize the cross-cloud solution of the three major public clouds in North America.

For Zilliz, it is a competition and cooperation relationship with large public cloud manufacturers. On the one hand, Zilliz Cloud relies on the public cloud platform. Our confidence comes from the fact that we are a start-up company, which can iterate products more quickly and achieve technological innovation. Therefore, when competing with public cloud vendors, we will insist on open source and open source, and only products and technologies can maintain core competitiveness. , in order to finally dance with the elephants.”

When it comes to the future development of Zilliz Cloud, Star-Lord did not make too many predictions: “I don’t make predictions, and our company seldom makes predictions, and predictions are easy to slap in the face.” Commercialization and revenue are still not the focus of Star-Lord at the moment.Faced with the achievements made in the capital market, Star-Lord attributes it to the fact that the team is technology-oriented, daring to create new technologies for the first time in the world, and to make them perfect, “Everyone knows that scarcity is very important, but often In the process of hands-on practice,FindThe pursuit of scarcity is a luxury, and scarcity is lonely, fairly uncertain, and high-stakes. “

At the same time, in Star Lord’s view, it will be easier to realize commercialization after achieving the ultimate in products and technologies. “We must first create an excellent product. If this product does not lead the world, we don’t think it will be possible. should be brought to the market, not to mentionshouldTake it to sell. “


[The creator says]

News Fast Delivery’s new open source startup interview column[The Creator Says].

The open source community needs creators, who can be individuals or companies of individuals. Open source software has been developed for more than 20 years, and open source contributors from the company have become the backbone, and a number of companies have been founded around open source software. This column will focus on open source startups and their founders, discuss the current state of open source, share open source business stories, and contribute to the open source community.

[The Creator Says]The column is open to all open source startups, please fill out the following questionnaire and recommend those creative companies to us:https://www.wjx.cn/vj/P2FFev2.aspx

Past reviews:

Tributary Technology: The community is a lever, let paying users take the initiative to find it

Zhang Liang: He founded SphereEx for the development of ShardingSphere, optimistic about the subscription system on the cloud

Two years after its establishment, without talking about revenue, what is this company thinking?

An old colleague pulled me to start a business and become an open source storage company

Jina AI: Defining Open Source Neural Search, 200 Million Series A Financing

#technician #astray #species #vector #database

Leave a Reply

Your email address will not be published. Required fields are marked *