Abusing AWS Lambda to make an Aussie Search Engine

boyter.org
8 min read
fairly easy
Article URL: https://boyter.org/posts/abusing-aws-to-make-a-search-engine/ Comments URL: https://news.ycombinator.com/item?id=28536059 Points: 1 # Comments: 0
TL/DR I wrote an Australian search engine. You can view it at bonzamate.com.au. It's interesting because it runs its own index, only indexes Australian websites, is written by an Australian for Australian's and hosted in Australia. It's interesting technically because it runs almost entirely serverless using AWS Lambda, and uses bit slice signatures or bloom filters for the index similar to Bing. I also found out the most successful code I have ever written is PHP, despite never being a professional PHP developer.

The idea…

So I am in the middle of building a new index for searchcode from scratch. No real reason beyond I find it interesting. I mentioned this to a work colleague and he asked why I didn't use AWS as generally for work everything lands there. I mentioned something to the effect that you needed a lot of persistent storage, or RAM to keep the index around which is prohibitively expensive. He mentioned perhaps using Lambda? And I responded its lack of persistance is a problem… At that point I trailed off. Something occurred to me.

Lambda's or any other serverless function on cloud work well for certain problems. So long as you can rebuild state inside the lambda, because there is no guarantee it will still be running next time the lambda executes. The lack of persistance is an issue because modern search engines need to have some level of it. You either store the index in RAM, as most modern search engines do, or on disk.

Without persistance it makes lambda a non starter. But, there is a saying in computing.

Never do at runtime what you can do at compile time.

I decided to see how far I can take that idea, by using AWS Lambda to build a search engine. How to we get get around the lack of persistance? By baking the index into the lambda themselves. In other words, generate code which contains the index and compile that into the lambda binary. Do it at compile time.

The plan, is then to shard the index using individual lambda's. Each one holds a…
Read full article