SearchYC Presents: SearchAF and SearchNM

by chengmi on 2009-05-22

Today, we are pleased to announce two new sister sites to SearchYC. The first, SearchAF indexes and searches the Arc Forum by pg and rtm; and the second, SearchNM does the same for New Mogul by nickb.

These sites run on the same Arc codebase as Hacker News, so it was only appropriate that SearchAF and SearchNM run on the same codebase as SearchYC. The process of supporting additional sites, while trivial in nature, was humbling in that I quickly realized how many hard-coded strings existed in the underlying Rails code. Once that was fixed, however, the process of forking and skinning the site was pretty straightforward. Surprisingly, SearchYC's web crawler, YCScout, was already in a pretty good state. After updating the configuration file to target another site, write to a new database, etc., it happily went about its business of generating new indexes for AF and NM. I was also happy to find that it supported nickb's customizations for displaying embedded video clips on New Mogul without issue.

On the horizon for SearchYC is version 3.0. There is currently discussion of rewriting the search backend of SearchYC, which is based heavily on acts_as_solr. This Rails/Solr adapter is no longer supported, so I'm going to either move to a fully custom solution, or more likely shop around for a better supported Rails plugin. Recommendations are always welcome.

In terms of new features, the roadmap currently consists of:

  • advanced search filters for points and post date
  • duplicate URL checker
  • web crawler improvements
  • data visualization (?)

SearchYC Performance Enhancements

by chengmi on 2009-02-15

It has been quite some time since we've updated this blog.

Some have speculated that we've moved on. And while that's true to some extent, we are still actively maintaining this website as a service to the YC community. Call it a personality flaw, or whatever you want, but there's just something about letting a website die that doesn't sit well with me.

On the topic of maintenance, many of you have noticed that SearchYC has been very very slow. This was due to a number of factors including poor database design, limited hardware, significant increases in traffic, and Google's spectacular ability to find the least efficient parts of the website (user search), and crawl them--constantly.

So what I've done is restructure and optimize the database a bit to make queries run faster. This resulted in a slight improvement on overall performance, but greatly minimized Googlebot's impact on response times.

I also decided to cancel our existing web hosting solution and host the website out of some spare hardware in my home. While bandwidth availability was initially a concern, the website has been quite responsive, so we'll see how things go and make adjustments as needed.

Another topic on our minds is the future of SearchYC. What features or improvements would you like to see? One idea is to compile a list of greasemonkey scripts designed for Hacker News. A (wiki?) page to track these HN-specific projects may help address the "non-discoverability" issue, and be a cool way to organize Hacker News hackers.

Ideas, comments, and suggestions can go in the feedback section.

SearchYC RSS Feeds

by chengmi on 2008-05-27

For the past few weeks, we've been working on a new algorithm for our Hacker News web crawler. The objective has been to decrease the time our crawler takes to index new threads while ensuring that the whole database is reasonably up-to-date with regards to points, edits, and deletions. While we've made significant progress towards solving this problem, we're not quite at the point where we can reliably offer the feature that we really want to build: notifications.

So in the mean time, we've built the next best thing. Today, we're launching RSS feeds for SearchYC search results. While you won't get a friendly reminder when someone replies to your comments and submissions, you now have a way to track keywords across Hacker News. We've found the feeds to be extremely useful for following discussions of SearchYC on Hacker News. I simply do a search for "SearchYC", sort by date, and then save the RSS feed using the blue icon at the top of the page. Now I know when and what people are saying about our project.

Another feature that we launched recently was a Firefox plugin for integrating SearchYC into your browser. Unfortunately, our site went down shortly after the release, so not many people were able to download it. We were reminded of this debacle when Richard Atkinson wrote a SearchYC plugin of his own, which is available on his blog.

We also loved Gabriel Weinberg's Ask YC Archive wiki. His page does a great job of organizing interesting topics and it's such a great resource that we've linked to it from our own Ask YC Archive.

SearchYC 2.0!

by chengmi on 2008-04-23

While SearchYC remains one of the best third-party search engines for Hacker News, we've been plagued with various performance problems. Many of you may have noticed that searches are extremely slow--sometimes even too slow for the site to be useful.

Well, that's what happens when you try to design and build a search engine in a week. Back in December, we decided to use the acts_as_ferret Rails plugin for its dead simple setup and configuration. This turned out to be a big mistake, as we soon found that Ferret (a Ruby port of Lucene) is surprisingly unstable in a production environment using DRb.

We stuck with Ferret for awhile because we weren't sure how many people would be using our website (the project started as a simple database for finding interesting things to read). Now that we're getting pretty regular traffic (which still amazes me, since we've spread largely by word-of-mouth), we figured it was time for a change.

So what we've done is essentially rebuilt SearchYC from the ground up. What began as a change in search technology turned into a monster refactoring and update of the entire codebase. Here are a few of the changes we made:

  1. We ditched Ferret, and adopted Solr instead. Solr has the advantage of being based on Lucene, at the cost of running a Java virtual machine in the background. There are many other websites that use Solr for search.
  2. Upgraded from Rails 1.8 to Rails 2.0--not a huge change, but there is a slight performance boost in the newer version.
  3. Moved some features to subdomains. For example, http://arcforum.searchyc.com/top/list is now http://top.arcforum.searchyc.com. The legacy URLs are still supported through redirection.
  4. Changed the format of URLs. http://arcforum.searchyc.com/by_date/query is now http://arcforum.searchyc.com/query?sort=by_date. The old format seems much cleaner, but the new format has allowed us to do some pretty nifty things...
  5. Nifty things such as new search options! You can now search for things like titles, domains, comments, user submissions, user comments, etc.
  6. And by far my favorite new feature: search within results.
    Want to find all the times pg mentions Arc in a comment? No problem. Do a user search for pg's comments, and then click the "Within Results" link to search for "Arc".

So there you have it. We've also added a link at the top of the page for feedback (powered by Disqus), so please let us know what you think of these changes.

We'll be announcing a few more new features soon, so stay tuned...

SearchYC Presents: Ask YC Archive

by alaskamiller on 2008-02-20

One of the finest assets of an online community has always been the hodgepodge of expertise brought forth from a diverse group of people. As Hacker News grew beyond just a link bank, the community naturally turned inwards for consultation. Submissions “tagged” with Ask YC started popping up more and more and the response solicited were both interesting and honest.

So to help people get the most out of past discussions, we’ve compiled a list of Hacker News community threads. The Ask YC Archive features a bar graph for visualizing discussions over time based on points. We’ve also set up an RSS feed so you can keep track of new Ask YC threads in your favorite RSS reader. We hope these tools will allow our community to continually benefit from the wisdom of past discussions.

SearchYC Presents: Best of Hacker News

by alaskamiller on 2008-02-05

Today we’re launching another feature of SearchYC: Top Lists. Using our index of Hacker News threads and comments, we’ve compiled lists of the most interesting items to date.

So what did we find? Sticking true to the Web 2.0 mentality, TechCrunch articles were the most submitted, by far. Other Silicon Valley favorites such as Valleywag, Mashable, and Wired also made it to the top.

As for top users, nickb gets the honor with more than 1900 submissions while pg has more than 2000 comments. Go ahead and take a look for yourself, and see if you can find other trends in Hacker News activity.

The YC Bump

by alaskamiller on 2008-01-10

So how did we do on launch day? 554 uniques in the first 24 hours and 231 that flowed in after the party, so maybe about 700 people in all. Definitely not a slashdotting or a digging but quite a turnout nonetheless. As the days go on, hopefully quite a few stick around and find some value in our little project.