Twitturly is giving away $1,000

So why are we giving away $1,000? Well, we need help making Twitturly even better. We want to be able to have the results on Twitturly automatically be grouped together by the semantics of the results. To claim the $1,000 prize you need to be a programmer with the skills to pull it off the majority of the time. It doesn’t need to be perfect, but it should be close.

Here is the problem:

Currently, the number one and two result are, “goosh.org - the unofficial google shell”.

Goosh @ #1 and #2

Now, you and I can tell that both of these links are going to take you to a site that shows either exactly the same thing or close to the same thing. So what I want to do is make it so that the second result appears underneath the first result as a sub-item.

To help with this programming challenge, I have decided to export the final array (slightly modified to remove proprietary ranking information) and serialize this for you guys to play with. It can be accessed here and put into a variable in your PHP scripts. To get the array how it is supposed to be, you must run PHP’s unserialize function on it. Here is the more readable and prettified version.

What I am hoping for is a function that I can send the array to. Then the function will look at the titles and descriptions to determine if it is similar to any of the results before it. It only needs to check the see if the item is related to items that appear before the current position in the array. The function would then modify that object and add the parent’s urlid to a object value of “parent” (which should be the first related result). When the function is done checking, just send back the entire modified object and we’ll do the rest.

In addition to grouping Goosh, there are a few others in the array that is included in the above sample that should be grouped. For example, #4 and #6 (#3 and #5 if you are looking at the PHP array ID’s) should be grouped. #8, #16, and #38 should all be grouped together as well (in this case, #8 should be the “parent” for both #16 and #38).

All we need is the PHP function. Our GUI guy can take care of making it look presentable (Tony, thats you!)

How to solve this issue:

Thats up to you! If you want to use one of the various API’s online, it should be free are really close since we don’t have the backing to pay for an expensive API. Ideally it will be all done in your PHP code however we understand that there maybe API’s that we can tap into to help out on this, and we are willing to give them a try if they do a good job. It should look at the semantics of the text because we want all similar items to be grouped together, even if the words are not exactly the same.

About the prize:

After all entries have been reviewed, we will pick the one that returns the best matches the most often. Once we have selected a winner (by July, 7th, 2008), they’ll get the $1,000. We understand that it’s not $100,000 like some of the challenges from companies with Venture Capital have done, but Twitturly has had no financing at all and hey, if you add the decimals at the end $1,000.00 kind of looks like $100,000! ;-) In addition to the $1,000 that we are giving away, the programmer will also get the warm fuzzy feeling that he helped make Twitturly better for everyone.

When it ends:

This contest expires on July 1st, 2008. The winner will be chosen by by July, 7th, 2008.

Please post a comment

If you are interested in participating in this contest or you have any questions, please comment below. Those that say that they are going to work on a solution will be emailed instructions on how to submit their code.

Thank you and good luck!

—-

UPDATE (06/19/2008):
Since many people are asking for the URLs to be included, I have attached two more files that can be used. One is an updated serialized results file, the other is a few PHP functions that can be used to get the data in the new results file into your PHP script.

New Results File: Download
New PHP Functions: Download

Tags: , ,

Comments Enhanced by Twitturly

13 Comments and 4 Tweets to “Twitturly is giving away $1,000”

  1. jstrellner Says:

    On Twitter jstrellner said: Twitturly is giving away $1000 - http://tinyurl.com/5jzy2n

  2. Craig Says:

    I have a solution, where can I email it to?

  3. Senko Says:

    Hi,

    you’ve presented an interesting challenge, and I’m working on a possible solution for grouping the results. One thing that’s introducing a lot of noise in the data is the fact that popular sites (e.g. Boing Boing, YouTube, etc.) mention themselves in the title, which makes the analysis more difficult.
    This problem could partially be solved if the data set included the url itself.

    Also a question, what would be (the order of the magnitude of) data set size in reality? The more data, the more reliable the results, but the longer it takes to calculate them.

  4. Joel Strellner Says:

    @Senko,

    The url can easily be used as well in the algorithm. It would be called [url] in the provided array and would be the full and final destination URL.

    The data set is 100 results. This algorithm would be run each time that the home page cache is generated, which is every 1 to 3 minutes, depending on the load of our site. You would only need to worry about grouping the 100 results, not all of the URLs that we have information on.

  5. JP Says:

    I have a couple of ideas, send me some info, thanks !

  6. juanlanteri Says:

    On Twitter juanlanteri said: A los programadores locos y osados que quieran ganarse U$S 1000 : http://tinyurl.com/5jzy2n

  7. Senko Says:

    @Joel: thanks for the info. As the sample data doesn’t have URL data, I couldn’t test it out, but I do have a partial solution, so please send me the details as to where to send it.

  8. DesireeSanchez Says:

    On Twitter DesireeSanchez said: @adammoro You should go for this! =) http://tinyurl.com/5jzy2n

  9. Shivanand Says:

    Hey, I know I am late to the party, but was just curious if I can still submit to this contest. Also, would you require a rules engine like behaviour, (i.e ) will you need to make changes to aggregation rules as your service grows?

    If I can still sumbit, to send me details on how to send the code in. (http://twitter.com/shiva)

  10. Joel Strellner Says:

    @Shivanand,

    Yes you may. The contest is still open for submissions until July 1st.

    Ideally, the solution provided will require very little work to keep it running smoothly. Ideally if you want to create a system that learns over time, it should be self learning.

    I am sure that over time we will need to make changes, but hopefully not.

    I will email you the email address to submit your code to.

  11. Shivanand Says:

    @Joel,

    If I get you right, url’s with the same domain name need to be grouped together (also group them hierarchically based on the folder structure of the url). I don’t understand your comment about it being self-learning?!

  12. jstrellner Says:

    On Twitter jstrellner said: Only about 8 hours left in the Twitturly developer contest. Any last minute submissions? http://tinyurl.com/5jzy2n

  13. Shannon Prue Says:

    …Missed this one by 8 days, too bad I bet I could have done it too.

  14. JP Says:

    Whatever happened to this, I put in a submission well before the deadline, but I haven’t heard anything back and there’s been no comments on here about what’s happening.

  15. jp Says:

    So what happened about this. The deadline was weeks ago, and there’s been no additional information. It would be nice to know that someone actually won something.

  16. Allan Says:

    Who won?

  17. Joel Strellner Says:

    The winner, Senko, was initially unreachable. He has since contacted us and declined the winnings. In the future, we may hold another contest with rules for scenarios like this. We still want to get the grouping feature released, but at this time, there are other priorities that we are working on.

Leave a Reply