yashpande

http://www.reddit.com/user/yashpande

Highest Rated Comments

yashpande13 karma2022-06-22 21:18:24 UTC

Broadly speaking, there are two primary reasons why grep-style queries fail. The first is that most search indices don't index documents exactly as they are. Instead, they tokenize the documents (e.g. turn foo:bar into foo : bar) and create an inverted index that maps a token (e.g. foo) to all the documents containing that token. They then respond to the query "foo and bar" by intersecting the list of documents that contain "foo" with those that contain "bar" and only scoring those documents. They have to do that because it's way too inefficient to iterate through every document in their index for every query. Because of this, grep-style queries like "foo-bar" will also be tokenized to "foo - bar" and will match documents that didn't have the exact text "foo-bar".

The second reason is that search engines often rely on term- and query-level rewrites to retrieve relevant documents that don't precisely match the query. For example, if you searched for "pytest", you might want to see results for "python test" even if it doesn't exactly match your query. Differentiating between cases like this (where rewrites are helpful) and grep-style queries where you don't want rewrites can be difficult, which is what leads the feeling that search engines are sometimes "ignoring" your precise query.

View History Share Link

yashpande13 karma2022-06-22 21:40:44 UTC

It's impossible for us to discern between fake and real upvotes because the only signal reddit provides is the final score, as well as the upvote percentage and controversiality.
For comments, we can choose to weight their importance as a function of the comment's score as well as authority signals for the comment author (e.g. the author's post and comment karma)

View History Share Link

yashpande9 karma2022-06-22 18:14:02 UTC

Great question! Firstly, when it comes to this feature, we’re selective about which forums we index, and Stormfront does not qualify. Furthermore, an important component of our ranking is a query-independent score that we assign to pages based on their popularity (e.g. by pagerank), past utility (e.g. what proportion of Neeva users clicked on their links for other queries), and ad load. I’ve never been on Stormfront so I can’t comment on its ad load, but I imagine it gets a low query-independent score based on just the first two factors.

View History Share Link

yashpande7 karma2022-06-23 01:06:44 UTC

Great question! Unfortunately, we currently don't have any information regarding a user's karma over time (only a simple karma count for the user at the time of the post) or how many posts they create per day on average. I totally agree that having this information could be useful to help determine whether a post is genuinely "user" generated vs. by a bot or marketer, and this might be something we look into doing in the future.

View History Share Link

yashpande6 karma2022-06-22 20:34:06 UTC

This is entirely hypothesizing, but I don't think it's in reddit's benefit to have people go to google instead of staying on their site. An issue lots of sites face is that it's hard to convert external traffic to daily users. If people always come to reddit through google, look at a page, then leave, this doesn't help reddit grow its userbase. They'd greatly prefer if people saw reddit as the first place to go for a certain class of queries (like amazon is for shopping) so they can have more user stickiness.

View History Share Link