By now you’ve probably heard (or read the viral post) Google search is dying. And (adding) @Reddit (to any search query) is all the rage. We are engineers at the search engine Neeva, and we agree that Reddit discussions are a great source of high quality information.

So how do you use the zeitgeist on forums and deeply integrate those discussions to make your search results better? Is it really as simple as adding the word [reddit] to every search query? We’ve spent the last few months taking a closer look at all things Reddit and search. We’ve looked at human evals, query and click logs, and our index of the web to understand how users discover Reddit content in search, and rethink the search experience in a Reddit-forward way.

AMA on how we think Reddit content can make search (at Neeva) better, and vice versa. And while we are at it, we’d love to learn what features you want to see from your favorite Reddit-forward search engine (whichever one that may be).

Here’s our proof:

Update (6/24): Thank you all for the great questions, we had a great time answering them! We will check back in if anything new pops up and hope to do more of these soon.

Comments: 83 • Responses: 17  • Date: 

San_Diego_Sands62 karma

Can advertisers infiltrate this platform more than they already have?

rahilbathwal4 karma

This raises an interesting question of how and when do you surface information from forums on the results page. We use forum-specific engagement signals like upvotes, number of comments, etc. to filter out content that is not providing real value.

ieya4049 karma

Is there anything that makes you view Reddit as a particular, unique resource? Or would you view any similar sorts of sites - large userbase, with a focus on discussion or Q&A - as similarly valuable to take results from?

And do you have any idea what a wicked tease it is to talk about a new search engine that we can't play with yet? :)

rahilbathwal11 karma

Great point. We’re integrating content from a number of community forums, starting with Reddit and Hacker News (with more to come!). We believe that there is a lot of value in content from real people, especially on queries that are typically heavily SEO optimized. There is an increasing trend of users explicitly searching for more Reddit content (eg. adding “reddit” to your searches) which makes Reddit a good place to start exploring how we can improve search results.

In user surveys, we found that over half of users felt the best Reddit result was better than the result at rank 3 (and almost a third felt that it was as good as or better than the result at rank 1), which was not true for any other site.

As for trying us out, we are available to anyone in the US. And expanding to other markets early Q4. Feel free to drop us a line at [[email protected]](mailto:[email protected]) and we can hopefully get you access soon!

MurphysLab7 karma

Why is your search engine behind a waitlist?

We'll get in touch as soon as Neeva is available in your region.

rahilbathwal5 karma

Neeva is currently available to anyone in the U.S., and we are working to expand to other regions (more coming soon!). The biggest reason it’s not available outside of the U.S. yet, is the importance and challenge of local results. We are one of the few search engines that has built our own index, crawling hundreds of millions of pages a day, and serving results. This effort means prioritizing regions to ensure the local results are just as good as the overall search experience. We will be in markets in the EU early Q4 and continue beyond soon after.

MurphysLab3 karma

The biggest reason it’s not available outside of the U.S. yet, is the importance and challenge of local results.

So why not offer a public beta version as a preview?

rahilbathwal6 karma

That is a great question and point. We typically do some form of beta in the lead up to our next market, but you raise an interesting point. Happy to share your feedback with the team.

maximumpineapple273 karma

I’ve heard that Google Search used to deliberately shun using a lot of machine learning (although largely pre deep learning days), possibly for understandability reasons and partly because rules were more effective given the complexity of Search. How do you think that’s changed? How much of your ranking stack is a giant neural network vs other factors? How do you measure what’s more effective at producing good results?

rahilbathwal3 karma

With increasingly capable language models being developed, search is definitely moving towards using more deep learning (eg. https://blog.google/products/search/search-language-understanding-bert/). Like you mentioned, this does raise concerns around explainability of results or potential model bias that we need to be careful about as we start using these techniques more.

In my opinion, one or the other is not necessarily more effective. We have tried to take a more balanced approach in our ranking stack and use a mix of traditional ranking signals such as topicality and textual relevance, document centrality to the query, etc. and deep learning approaches such as embedding queries and documents in high dimensional spaces and computing signals including query to document match (eg. cosine similarity between their embeddings) or query similarity to past user queries that led to clicks on relevant documents.

As for measuring quality, all of these go through various stages of evaluation including human ratings for any changes introduced by adding new signals.

doespostmaloneshower3 karma

How can we trust you to be good custodians of our data? What is your company’s business model?

rahilbathwal4 karma

Our business is to completely flip the traditional ad-supported model that prioritizes advertizers over end-users. That model too often exploits privacy and personal data for economic gain. Instead, we have a simple proposition, make a search experience that is entirely designed around the user – no ads, and private (your relationship is with Neeva, akin to a doctor’s office) and in return you pay a small monthly subscription. We offer a freemium model, a basic tier that has the same ad-free and private search functionality with connectors to search across apps like Dropbox and email that is completely free. And a premium version that includes paid versions of password manager and VPN, and unlimited connectors all for a low monthly cost of $5.

By removing ads, advertizers, and ad-revenue our goal is to take out the conflict of interest and compromises. We don’t profit off of your data and we don’t allow third parties to either. As for trust and being good custodians, that starts with clearly defined values and principles on privacy which we have laid out and it is earned over time which we hope each day our members continue to feel and see.

mianori3 karma

This is pretty ambitious - you will have access a lot of user data - search history, sites history, passwords. What measures are in place to protect this data?

rahilbathwal2 karma

For one, we have a feature called "Memory Mode" that lets you explicitly decide whether we store any of your searches or interactions. Users can choose to disable this entirely in which case your account will not be associated with any of your searches.

oakteaphone3 karma

Do you think Reddit has bad search functionality on purpose, so that people will head on over to google and add "Reddit" to their search string to get results from Reddit, making Redo rank better in the engine's results?

rahilbathwal6 karma

To add, search in general is a challenging problem and takes a lot of careful selection and tuning of various signals. This makes it very difficult for non-search engines to just add search as a feature.

This is also where we think Neeva can provide value with our focus on improving search. A lot of the work we have done in creating core ranking signals such as topicality and textual relevance, query centrality to the document, query independent document popularity, etc. apply directly to ranking Reddit posts as well. Additional Reddit-specific signals such as upvotes, comments, etc. also help with ranking but are not enough on their own.

throwaway9016172 karma

Just downloaded your app.

Why on earth would I make your app the default browser for use across my entire phone?

Why would I give your unknown company access to my browsing history?

Also using a VPN that has a Chicago area address and did a search for sightseeing in Chicago and was getting hits near the top for TripAdvisor in India with Chicago travel details in rupees.

rahilbathwal1 karma

Really sorry to hear that. One possibility is that if you've granted us permission to use your device's location, it would take precedence over your IP and the results may not accurately reflect the location of your VPN. We use device location over IP (only when permission to use location is explicitly granted) because it gives us a much more accurate location and helps us present better local results (e.g. for the query [restaurants near me]). That said, the experience you had wasn't great and is definitely something that Neeva should improve on, so I'll take this back to the team immediately. Please feel free to also reach out to [[email protected]](mailto:[email protected]) for any concerns you might have.

As for your concerns regarding giving us access to your browsing history, I fully agree that you shouldn't give any company access to your data without first understanding and vetting their privacy policy. User trust is something that we must earn and the commitments laid out in our privacy policy are starting points. You could also choose to first try out the experience on desktop at neeva.com (we have features like Memory Mode that let you decide whether we store any of your search history).

mondalibnor2 karma

Why bother creating a unique index? Seems like the industry standard is to just do some magic on top of Bing and let Microsoft absorb the cost of maintaining the primary index.

rahilbathwal9 karma

In the same vein as Alan Kay's famous quote "People who are really serious about software should make their own hardware", we strongly believe that building our own index is important if we want to compete in the search space.

For one, having our own crawl and index allows us to build features like quicklinks (https://neeva.com/blog/quicklinks-adding-community-forums) by inverting links from the webpages we have crawled or building a discussion forum specific experience by restricting search to specific websites such as Reddit.

Zoetje_Zuurtje2 karma

What're your favourite programming languages to work with?

rahilbathwal4 karma

Personally, I like the simplicity of Python for quick prototyping. Here at Neeva, I spend most of my time programming in Go and Python. Basically anything but Perl :)

Zoetje_Zuurtje1 karma

Yeah, I get that. I don't know either though, do you have any good resource to learn Python?

yashpande3 karma

My personal preference is to start off with a project in mind and then try and implement it in the language. The project should be something you’re personally interested in - when I first learnt Python, I tried using it to make a pacman agent that would follow pellets and run away from ghosts. It’s reasonably easy to find tutorials online for whatever thing you choose (e.g. just search for “how to make __ in python”) in case you get stuck.

rahilbathwal5 karma

+1 to getting started with projects. It’s a great way to get practical experience with any programming language and generally helps build skills that transfer to other projects/languages as well. If you’re just starting out, I’d also recommend codecademy.com.

Zoetje_Zuurtje1 karma

If you’re just starting out, I’d also recommend codecademy.com.

Ooh, I just completed my Learn C# course there - and now I'm trying to build a small project with it. Thanks for the advice. Last question:

Object Oriented Programming or Functional Programming?

rahilbathwal2 karma

They're both very different paradigms. I personally find Functional Programming a very interesting way to think about problems. Object Oriented Programming is more widely used though, so perhaps more useful to learn.

Zoetje_Zuurtje1 karma

Thanks for the answer! Looks like I'm on the right track then. Do you happen to know any intro to AI, then?

rahilbathwal3 karma

Check out Coursera, they have a bunch of good courses on introduction to AI/ML

onemoreclick2 karma

Was the @ a typo or is that used in Google search now?

How much do bad titles affect Reddit search?

rahilbathwal5 karma

Sorry, that was a typo. I don't think Google search does @.

As for bad titles affecting search, it definitely has an impact on how highly the documents ranks. Query term matches in the title are relatively more important than say matches in the document body. However, this is a good example of where deep learning can help. We use embedding style models to compute the semantic similarity between the query and the title (and other parts of the document) as an additional ranking signal. So while bad titles do affect the final ranking for a document, there are methods we use to tackle these challenges and this is also one of the ranking areas we are actively iterating on.

communityml2 karma

As I write this, there are 54 comments... that's not that many, but as a mere human, it's still a lot to read (also, these comments, questions and responses are really great).

From what you've learned so far, what is the relative importance of the comments versus the original post?

Do you find you have to do anything special for posts with a ton of comments?

rahilbathwal2 karma

In my opinion, a lot of the value on Reddit comes from the rich discussions between users and so a ton of comments could be indicative of value. That said, it does open up the opportunity for spammy comments (perhaps what you're referring to?) which is something we must be mindful of.

From a ranking perspective, whether the post or comments are more useful is highly contextual to the query. For example, if you're searching for "best credit card", there are posts like https://www.reddit.com/r/personalfinance/comments/b5qjvf/i_researched_cashback_credit_cards_so_you_dont/ where the value lies largely in the original post as well as posts like https://www.reddit.com/r/CreditCards/comments/qmxj0s/best_credit_card_for_each_category/ where you might find the comments more useful. Ideally a good ranking algorithm would be able to surface both. Part of the challenge here is that a post with a lot of comments might seem more relevant to the query if it contains the query terms with a higher frequency. One way we're trying to tackle this is assessing the relevance of a post at a per-comment level to boost documents that might have fewer but higher quality comments. Reddit-specific features such as the number of upvotes / downvotes for a comment are also helpful in filtering out ones that might not be very useful.

biffatheasshole2 karma

Why should i use this instead of, say searx or yacy? They're selfhistable, and give control, rather than google or you guys.

rahilbathwal2 karma

To start off, I think it’s great that there are an increasing number of alternative search engines for people to try and pick from. Neeva’s search engine also tries to give you control over certain aspects of your experience. For example, you can disable “Memory Mode” if you don’t want us to associate your account with your searches or you can use our preferred providers feature to upvote/downvote specific domains across news, shopping, media, etc. to customize your search results. Control and personalization (within the boundaries of the data you decide to give us access to) are important areas that we think about from a product perspective.

In my opinion, another key differentiating factor for Neeva is that we are crawling and building our own index. This, in combination with our ad-free experience, lets us innovate on the search experience directly. For example, quicklinks is a feature we recently launched that provides you with content from community forums like Reddit that is contextual to your search results. This is only possible if we have our index since it requires extracting and inverting links from web documents.