Learn courses for free
Publish new courses and earn
Check your skills
Improve your skillsets for free
 
or
    
learnNpublish-New Opportunities in Search Engine Technology
New Opportunities in Search Engine Technology [ Share ] [ Register To Take Test ]
New Opportunities in Search Engine Technology Anna Patterson is President and Founder of search engine Cuil. Her focus is on scaling architecture, tackling one of the major problems in search-the exponential growth of the Internet. Anna was the architect of Google?s large search index, TeraGoogle, that launched in early 2006. While at Google, Anna was the technical lead of one of the two Web ranking groups at Google, in charge of GoogleBase, and the manager for the core piece of Google's ad-matching technology. She joined Google in 2004 after designing, writing and selling Recall-the largest search engine in existence at the time at 12 billion pages. Anna has a PhD in Computer Science from the University of Illinois at Urbana-Champaign, and was a Research Scientist at Stanford University.
Author: Anna Patterson Aggregator: Sujatha Vendor:
Type: Free Points Required: 0     Comments (0)
Lectures
I went to a non-profit as a volunteer because I actually needed to be a stay-at-home mom for a while. I was having a difficult pregnancy with the second of the four kids and my doctor said I needed to be on bed rest and I said, well, can I program? So it's very similar to being very stationary for anyone who does program here. So as some people do, I got into the project and I really couldn't let go, so I kept programming up a search engine at recall.archive.org. And when I launched it, it was 12 billion pages and it was by far the biggest search engine in the world when I launched it. Yahoo was at 2, and Google was at 3, and Recall had that interesting property of indexing pages over time because the Internet archive is a non-profit that tries to archive all the Internet content that's gone on and they take crawls as donations so their primary donation source is from Alexa and now Cuil is donating to the archive as well. So everything we crawl, you know, will get saved for posterity at the Internet archive. So that's why the 12 billion search engine was bigger than the number of pages that were on the Web at that time, because it took snapshots of the Web. The non-profits also taught amazing things about management because you're trying to organize people who aren't getting paid. So what you have to do is after all you inspire the employees, and that's really important in the startup as well because at a startup, you're often underpaying with respect to the broad market so you really need to have a team that really believes in your vision and at the Internet archive or any non-profit that you get involved in and volunteer. That's one of the founding organizing principles around it, is that the people they believed and the vision. Another lesson that the industry has taught you over and over again is that small team has made a big difference but you can really - when you sit at any non-profit you know they're giving books out to people who never had books before and they're doing all these very worthy causes, and so it's worth showing up.
The Web has grown super exponentially because it still looks like an exponential even plotted on log scale and you can see when each of these properties came on the market. So when AltaVista came on the market, it indexed over 50% of the Web. And when Google launched, it indexed about 50% of the Web. So as the Web has grown, people need to buy a certain amount of hardware in order to index a certain amount of information and so obviously you can't just invest exponentially in hardware. So search engines really haven't kept up with exponential growth of the Web. So we thought that provided an opportunity for Cuil, because rather than buying more machines, Tom, our co-founder actually did some mathematics in order to come up with a mathematical model about how to build a search engine differently. So standard search engine is built as a scattered-gather architecture, I don't know how many people know that terminology, but basically it would be like asking every single one of you the same question and you guys emailing me back the answer. So you scatter the query and you gather up the answer. So he came up with a representation so that you didn't have to scatter the query anywhere; that over 95% of queries could just go to one machine. And with that, it means that you can have architecture that's entirely on disc instead of in memory. If I have to ask every single one of you a query and you're all computers, then I'm going to have to use the memory because it's going to take too long in the latency of the network to get the question out to you, how do you compute the answer, and get it back to me. So the latency is high so the computation on each of the machine needs to be low. So he said, well, let's spend the latency instead going to disc and doing lots of computation. So it does lots of computation on the mathematical model in order to figure out the answer. So that's why on 140 machines, we can serve 125 billion pages. And trust me, Yahoo! and Google use more.
And as far as getting acquired, I mean, you know, we're trying to focus on the product. I think that if you - you know a lot of companies are built to be acquire and I think what happens there is you leave yourself in a really vulnerable spot because you're growing and you say hey, I won't hire that expensive VP of Whatever because hey, man any day now we're going to get acquired. And then your product winds up suffering. So I think you need to really want to do the company. Because you don't know how long you're going to be at it and luckily I'd been searched a long time. I know want to stay in search. So you know it's fine with me whatever happens with the company, but you have to focus on building the product to making the product better and you have to focus on building a sustainable company.
Well, one the things that motivated us is the technology, because what we do when we go through the Web in order to come up with the models of the pages and the way that concepts and ideas on the pages interact. We realized, you know, we wanted to get a way from some of the stillness you see on the Web. So Direct Hit, a lot of people don't remember it, but if you clicked on a result, it would promote the results; if people didn't click on it, it would demote the results. So you can imagine, if something ever got to the front page, essentially it gets some clicks, it would always stay there. So I think one of the amazing beauties about the Web, when I remember the Web when it first came out is that there was always this artful page on, whatever, Harry Potter, and what the world created was always changing, the main fan was always changing, and so the pages were always changing. So I really wanted to make a conscious effort to uncover those zealots out there about their particular topic that changes over time. And I did a lot of mining at the archive to see these things change over time. So when we made that decision, we realized we didn't actually have to track any individual user habits. So it actually came out of the technology decision. We realized that because we've made one technology and product decision, we could make this other decision on the privacy.
Well, one other thing about our culture... I guess, so, Tom is from Ireland, I'm an Irish citizen as well, so I would kind of joke around and call it a "pub culture". It's not that we go out drinking, it's more that, you know, conversation, people have to be able to hold a long conversation and they have to being like together because you're together for just so many hours. Our first interview, I think somebody blogged it about last week, but our first interview is taking the person out to lunch. Because of they can't hang with us for one hour, three people at lunch, then "n" hours of day we're going to have spend with them, it's just going to be no. So we really try to go for something that looks good on paper, then they get into lunch, and then just fun to talk to. I mean, they have to be able to talk about movies and, you know, whatever. I know it sounds strange but you always get off topic when you're working, and you want somebody who's nice to be around. And then we like people that have had other projects. We find, like, a lot of undergrads who have a small project on the side or have consulted or whatever. We find that the kind of people that you have multiple balls in the air are great, and people have been at other startups are amazing.
Add Tags
Post a Comment: