When figuring out how to aggregate audiences across the web for your book, you need to understand its geography – or, its topology as it usually referred to in network parlance – to get an idea of what you”re up against. First, there is the size of the web. A recent post on Pandia Search Engine News summarized the recent estimates of web size as follows:
- 15 – 30 billion web pages
- 109 million web sites
- 47 – 48 million active web sites
- 1 + billion Internet users
This data is however, far from certain knowledge. In fact it is not known exactly how many web pages or even web sites there are at any given point in time. In his insightful book “Linked” on how things life – including the worldwide web- are connected, Albert-Laszlo Barabasi, talked about two important features of the web that help and hinder our use of it for searching out information or marketing our products. (His current research focuses on applying the concepts his group developed for characterizing the topology of the worldwide web to uncovering the structural properties of complex metabolic and genetic networks.)
- The web is divided into “continents.” When robots employed by the search engines crawl the web to index pages, they follow links to discover new or updated web pages. The web is a directed network, meaning that while site A may link to site B, there may not be a reverse linkage. Directed networks (of any kind) automatically fragment into four areas or, in Barabasi’s term, “continents.” The main contient is the “Core.” These sites are the highly interconnected mainstream of the web and include the major sites like the search engines, major media portals and so on. Then there is “InLand.” The sites in this region have links pointing into the Core, but not vice versa. These typically sites new to the web. The next region is “OutLand.” In OutLand, sites are pointed to from the Core, but do not link back to the Core. Many of these sites are corporate sites. They are often dead end destinations for link followers like robots. Lastly, there are “Islands.” These regions are not linked in either direction to the Core. They may, however, include tightly linked, though generally isolated communities. But finding these communities can be difficult.
- The web is dominated by hubs in each of the continents. Hubs are sites with many inbound and outbound links. Maps of the web show a hierarchy of such hubs. Sites that provide useful content and links and have been around for awhile have a greater probability of growing into hubs. When confronted with the immensity of the web, it is a natural human tendency to seek out hubs as a way to find information quickly.
- The web is dynamic. While the web in general continues to grow, specific sites may go dark or become inactive. This can confound searches when you’re trying to find a viable market. Search results may include many of these “dead” sites.
- Search engines still don’t know it all. Through the early 2000’s, search engines typically had indexed only about 25% of the web. While results have improved over time, search engine companies have stopped announcing the number of pages they have indexed. It is doubtful that the web will ever be fully indexed. Also, the amount of information downloaded and indexed by search engines has its limits as well.
Most of this may seem to fall in the bad news column. So here’s the good news.
- Human intervention pays off. You can submit your site to the search engines rather than waiting and hoping they will find you. Social networking sites like Digg, del.ici.ous, StumbleUpon and Reddit can help guide users to our informatino or products. These tagging sites represent the first steps toward adding a human interpretation of the raw content on sites. There are also myriad directories, forums and news groups that can help point you to those communities difficult to find through general search.
- Indexing algorithms are getting smarter. These algorithms now take into account freshness, relevance, age and authority. All good things for savvy and disciplined marketers. The specific weights that the ranking code uses is still opaque for competitive reasons (and to prevent gaming of the system), but the general rules are easy enough to observe in action.
- Technology isn’t standing still. The next step in the evolution of the worldwide web may be the “semantic web.” This is a web where every site is tagged in a fashion where robots can easily interpret what the content is about. This opens the door to having truly smart search agents that can scour the web to find sites appropriate for your marketing efforts. Also, new technology such as software that can read and interpret the chatter of blogs and other social media will soon make it easy to find and understand the demographics of your audience.
So while our knowledge of the web may be imperfect, our ability to explore and use it in new ways will make ultimately make it our best marketing vehicle.