Calculating Epinions' Total Number of Reviews: A Futile Effort?Jan 13 '07 (Updated Jan 17 '07) Write an essay on this topic.
Popular Products in Books
The Bottom Line Ever wonder how many reviews Epinions has?
What starts out as a simple enough and promising process keeps taking turns for the worse, occasionally looking like it can be salvaged, and other times looking quite dire. While the success of finding that magic number is debatable, the other numbers that show up in the process are perhaps more interesting and surprising. The Process A vital first step is to discover Epinions' counting sequence. Every review generates a page like epinions.com/content_xxxxx where the xxxxx is a very large number. As you'll notice, as time goes on, those numbers get larger. However, simple logic shows that those numbers (content IDs) do not increase by increments of 1 or something simple like that. So, how do you figure out what the actual increment is? The easiest way is to simultaneously put a few reviews into draft mode, thereby providing you with successive content IDs. By simply subtracting one content ID from another, you'll always notice a difference of 65536, or a multiple of 65536. 65536* is a very important number, as it provides you with the ability to visit every single review that has been generated (not necessarily published) since January 17th, 2001. What's significant about January 17th is that it was the first day an item was published using the current content ID system. A more convoluted system was used prior to that, and that's dealt with later on... With this knowledge, you can set out to find the first item drafted under the new design. As with the above number, it's not some obvious number, so here you need to take a methodical trial and error approach. First, you need to find a review published on or after Jan 17th, 2001, which you can do by googling site:epinions.com "Jan 17 '01". Going to one of those reviews, and then subtracting multiples of 65536 from the content ID, you'll eventually narrow in on one: http://www.epinions.com/content_6458412676 The first review ever generated with the content ID system was for Whistler-Blackcomb, and has since been deleted by its author. Knowing this first content ID, and then finding the content ID of the most recent review (snatching something from the Just-In page is usually accurate enough), you can [theoretically] easily calculate the total number of pages generated since January 17th, 2001: (Recent Content ID - First Content ID) / 65536 = # of reviews or, more simply: (Recent Content ID - 6458412676) / 65536 = # of reviews Using an ID from December 30th, 2006, we get: (299338337924 - 6458412676) / 65536 = 4 468 993 However, Member Advice and Writer's Corner articles are not included in that number, as they operate separately from reviews, and didn't show up until Feb 2, 2001. They share the 65536 increments, but start at a smaller content ID. Following the same sort of process as above, you narrow in on an advice piece in "Major Performers in Rock and Pop Music" by now-inactive Daniel_Rf: http://www.epinions.com/content_814784644 Therefore, to calculate the number of Member Advice and Writer's Corner contributions, you do the following, using a recent content ID from a Writer's Corner or Member Advice article: (Recent Content ID - 814784644) / 65536 = # of WC & MA articles Using an ID from December 30th, 2006, we get: (4924022916 - 814784644) / 65536 = 62 702 Reviews since Jan. 17, 2001: 4 468 993 Member Advice & Writer's Corner Articles since Feb. 2, 2001: 62 702 Total: 4 531 695 Calculating the number of reviews that were published using Epinions old system is not realistic. However, when Epinions changed the system, they announced that Epinions already had "over 1 million reviews and comments." Unfortunately, that's an extremely vague number, and made worse by its inclusion of comments. Optimistically guessing that half of the 1 million is reviews, that means that Epinions has had in the neighborhood of 5 million content pages generated in its lifetime... or maybe not. We're not done yet... Vital to note is that with the above numbers, I'm including everything that has ever been drafted. Therefore, unpublished, deleted, and ticketed reviews are included. Also included are reviews that are not visible to the public. Therefore, it should come as no surprise that the number of reviews that bring visitors to Epinions is drastically less than 5 million. To try to get a vague picture of this smaller, but more significant, number is to take a random sampling of content status. In doing this, an unexpected problem presents itself, as you'll clearly see by looking at the results: 13 - Very Helpful 5 - Helpful 3 - Somewhat Helpful 0 - Not Helpful/Off Topic 6 - Show 3 - Don't Show 4 - Draft 10 - Deleted 5 - Ticketed 51 - Page Does Not Exist Total: 100 Up until this point, it has been assumed that every multiple of 65536 between 0 and 4468993 has generated a page. However, with 51% of the random sample not existing when it theoretically should, that assumption is obviously flawed, and explaining that 51% is not easy. My initial belief was that these non-existent pages were the sole result of deleted accounts. Deleting an account completely wipes away all traces of that member, while deleting a single review leaves behind a rotting carcass, so to speak. However, as the percentage grew, it seemed quite unlikely that over 50% of members have had their account deleted. With such a large number, it's more likely that the content system randomly skips past roughly 50% of the possible content IDs. Why? Epinions doesn't want detailed information about its assets (ie. reviews, members, etc.) made public, and a simple complicating factor like this makes it that much harder to figure it out... In any event, the previous estimate of 5 million generated pages is now down to 2.5 million. Regardless, we could still potentially approximate the number of visible reviews on Epinions, but unfortunately the above sampling is far too small to provide remotely accurate numbers. Anybody that frequents the Just-In pages knows that Show ratings more than double Don't Show Ratings. Additionally, browsing Epinions quickly tells you that Very Helpful reviews do not outnumber the combined total of non-Very Helpful reviews as the sampling suggests. Clearly, a much larger sample would be needed. Just for the hell of it, I will calculate the inaccurate number of visible reviews: Total: (0.13 + 0.05 + 0.06) x 5000000 = 1 200 000 VH Reviews: 650 000 H Reviews: 250 000 Show Reviews: 300 000 ...and the non-visible reviews: Total: (1 - 0.51 - 0.24) x 5000000 = 1 250 000 SH: 150 000 NH/OT: 0 (Obviously not...) Don't Show: 150 000 Draft: 200 000 Deleted: 500 000 Ticketed: 250 000 So... Without an automated means of recording the status of thousands of reviews, there's no way to get an accurate picture of the distribution of reviews on Epinions. What's also alarming is the percentage of reviews that have been deleted, ticketed, or that are stuck in draft mode. Even with the inaccurate numbers, it is clear that a large chunk of Epinions is occupied by these invisible contributions. Although the whole process I've described above doesn't help a whole lot in uncovering a precise number, it can be used to compare review submission rates for any given period. It still has its glaring faults, but it is quite possible to compare, say, the number of reviews generated in February 2002 to the number generated in August 2004. Ultimately though, you won't learn much of anything new by doing that. You're just as well off to browse Epinions and you'll eventually draw the same sorts of conclusions, primarily: Epinions doesn't get nearly as many submissions as it once did (especially true with regard to Member advice articles), and the reviews it does get are subjected to more stringent rating procedures. Another Method? There's another way of calculating the total number of reviews, but it is filled with even more guess work and assumptions. Knowing that 2004 had roughly 150 000 reviews published, and that it was a slow year**, it's reasonable to assume that at least 1 125 000 reviews have been published on Epinions in its 7.5 years of existence. Given that the numbers of submissions were far higher in the pay-per-view days, that actual number of total published reviews likely sits closer to 2 million. Reading a press release that announced eBay's acquisition of Shopping.com*** in summer 2005, we can pretty much be assured that there are now 2 million or more published reviews: "Shopping.com's Epinions community of more than 400,000 reviewers has produced nearly two million detailed reviews that help consumers make informed buying decisions." We're likely sitting not far over 2 million total reviews at this point in time, and we can also deduce that there's probably around 500 000 reviewers (although exactly what constitutes a reviewer is unclear). Now, let's again try to find the number of visible reviews: Based on the site-wide Just-In page for the 2 weeks prior to January 2, 2007, the ratio of visible to invisible reviews is about 30 to 7. Supposing 30/37 of reviews are visible, then that's somewhere in the neighborhood of 1.6 million visible reviews. Conclusion What I found certifiably interesting was the number of unpublished, deleted, and ticketed items that Epinions houses (~40% of content). Add to that the deleted reviews that were attached to deleted accounts (a portion of the 51% Page Does Not Exist group), which if my numbers have any truth to them, really makes you wonder what's going on. Perhaps member - and review - retention are bigger issues than we've been led to believe. Alternatively, and just as likely, is that there are a bunch of factors that I'm failing to consider in my calculations... In the end, not much has been gained by the number crunching. I personally feel confident that there are around 1.5 million visible reviews on Epinions, or an average of 3 per reviewer. However, I recognize the large levels of error that come from the many assumptions made in this process, and do not profess that to be the answer. I haven't taken into account that SH reviews are visible in Online Stores & Services. I've interpreted press releases and news that could be interpreted in other ways. I also assumed that that data was accurate, and not the result of some over-zealous PR agent. I assumed that the two weeks prior to Jan 2, 2006 represent the average rating distribution for Epinions' entire existence. Let's not forget that I have no formal experience with software engineering, statistics, or any of the other fundamental training you'd need to fully grasp the complexity of the review database. These, among other things, ensure that my calculations aren't a whole lot better than grabbing a number out of the air. Even if I could come up with a genuinely accurate number of reviews on Epinions, it's not like it would make any sort of difference to the way you or I function. It is, after all, just a number. * 65536 isn't just some random number, and does in fact have some computational basis behind it. In programming talk, an unsigned 16-bit integer has values ranging from 0 thru 65535, meaning there are 65536 possibilities. For instance, 16-bit colour means there are 65536 (2^16) possible colours. There are some other interesting/geeky characteristics of the number 65536, including its role in Fermat primes. As for its use on Epinions, it has since been discovered that the naming scheme works on a hexadecimal system, as detailed by mobiprof in his article, Opinion URL Naming Schemes. ** Based on the data available at http://www.alexa.com/data/details/traffic_details?url=epinions.com *** http://investor.shopping.com/ReleaseDetail.cfm?ReleaseID=164975 |
| Read all comments (16)|Write your own comment |