Links bring traffic.
Traffic brings sales.
This guide describes how I make new link-lists for link building, as of February 2016. Here’s a tier two GSA-SER project I ran and the scraped target list is the result of a Footprint Factory analysis:
I have just over 67,000 verified links from 22,000 unique domains.
The reason this is so exciting is because the raw scrape list I’m running only had about 830,000 URLs in it.
That means that about over 8% of the URLs I scraped during the making of this tutorial gave me verified back links. To put that in perspective, a lot of people only get about 1-2% with a raw list. So 8% is pretty amazing.
That cuts your working time down by 75%, and it also means you only have to do this process a few times to build a very large database of powerful links you can use to rank your sites quickly and easily.
This guide is almost 6,000 words long because it covers all the tiny (but important) details you won’t find on SEO blogs. Despite the word count, the fact is you can set up the link-finding process in less than 30 minutes once you understand the 5 step process.
Below are some of the results I got from using this strategy. It’s pretty much all GSA-SER links. There are some blog network links in there were as well. But most of it is GSA, so it’s the tier one of 871 links and the 67,203 tier 2 links pointing to it, shown in the image above. This is a new website and the rank has been checked again 1 week after I first made the video series:
I then did not build any links for more than 2 weeks. Not a single one. And the rankings continued to improve (results vs 14 days earlier):
As you can see, the strategy does work even though the general consensus these days is that tiered link building and GSA-SER are not effective anymore. You can see that’s simply not true.
Of course you have to do tiered linking the right way, and find lots of new link targets that have not already been spammed to death! More about strategy later!
HOW THIS GUIDE IS STRUCTURED
In general, SEO and “making money online” can rarely be defined in a step by step manner. You often have to take diversions when you’re working in the trenches and forging your own path as much of it is “open road” rather than linear.
However I’ve tried to describe the strategy as an iterative process. Here is a visual representation:
This guide will describe this cycle in 5 steps.
I also occasionally talk about the Bear with me here, it will all make sense soon if it does not already! Step 1 is the longest section of this guide because I’ve really broken it down. Its arguably the most important step and well worth doing properly.
Understand that you will sometimes hit snags and you will have to apply yourself. I’ve explained the thought processes involved in creating huge new links-lists in the hope that you will be able to recognize and solve any future challenges you may have by yourself.
STEP 1 – GETTING STARTED AND CHOOSING YOUR TARGET CMS
First step is to find a good platform for Content Management System (CMS) you want to target. The goal is to have a starter/seed list (for just one specific CMS) for use with Footprint Factory, so you can multiply your targets lists later.
The reason for choosing just one CMS is because each CMS has different posting rules and a different procedure for adding content and therefore getting your backlinks. So it’s best to focus on one CMS at a time.
As shown in the previous image, you can either:
Choose a CMS you already have a success/verified links-list for
Choose a CMS, and then find or create a starting/seed list for it.
A. Choose a CMS you already have a success/verified links-list for
Option 1, you can simply import single CMS success lists that you already have directly in Footprint Factory. If you don’t have any links-lists you can buy them on BlackHatWorld and other forums,. Some people will sell their success list or their verified linked lists and those are very often split down into different CMS.
If you have GSA-SER or some other link builder already, you can export your verified links from that program and you can use that as your starter/seed list for Footprint Factory.
Exporting Existing GSA-SER Links-Lists
If you’re already a GSA-SER user that saves verified links-lists, Step 1 is very easy if you know which CMS you want to target. You simply type this into windows explorer: %appdata%
Then open the GSA-SER folder and then open the “verified” folder. The precise name of the verified folder is specified in the GSA-SER advanced options panel, but for most people it will be “site_list-verify”. These lists are CMS specific, and are perfect for importing directly into Footprint Factory!
Don’t Know Which List To Use Or Don’t Have A List At All?
But what if you can’t decide which existing success/verified links-list you want to use? Or you want to target a new CMS, or you don’t have a starting list to use with Footprint Factory? Which CMS do you choose?
B. Choose a CMS, and then find or create a starting/seed list for it
The second thing you could do is research CMS first, and then find/buy or create a starter/seed links-list to expand, or choose an existing list if you have one for that CMS.
I’ve made a whole video on “choosing a CMS” because it’s an open-ended question, with lots to consider. Starting with a new list is tricky for people who don’t know what they’re doing. A lot of people will not even know where to start. So if you’re not experienced in mass scraping and mass link building, then you’ll find the video very useful.Remember: you have to have a starter/seed links-list to import into Footprint Factory so you can find lots of footprints and make a very targeted scraping list.
Using Footprint Factory means you will find ALL possible footprints, and gain an unfair advantage over other link-builders!
Using GSA-SER To Get CMS Ideas
I like to use GSA-SER because if I choose a CMS from it then the chances are I’ll easily be able to post to that CMS. Sometimes you may have to edit some of GSA-SER’s engine files to make it post properly (another tutorial for another time!), but 90% of the time GSA-SER will give you a fair impression of how many links you can make from a single target CMS.
With GSA-SER open, let’s go ahead and research a do follow contextual platform. Start by having all engines selected.
Right-click and then select “uncheck engine that use no contextual links”. And then after that you can select “uncheck engines with no follow”. So now we can see a few remaining engines that we can look at that are both do-follow and contextual:
Links that are Do-Follow and contextual are the most powerful links. Any of these remaining CMS would be excellent candidates for further investigation (as shown in the image above):
PHPFox
Ground CTRL
PHPizabi
SocialGo
SocialEngine
DokuWiki
MacOSWiki
MoinMoin
There are even more such targets in GSA, I only listed the ones in the image above.
The next step is to check to see if you have decent verified lists for any of those CMS. You only need 25 unique domains in order to move on to the next step (Step 2 with Footprint Factory). Obviously the more unique domains you have, the better. Usually I don’t move on to the next step until I have 1000 unique domains for that CMS, but it can be done with just 25.
If you don’t have existing verified lists for any of those CMS, then you can either change your requirements (we looked for Do-Follow and Contextual), or you can create your own starter list for one of those CMS. Creating your own starter list involves a small initial scrape (Step 4).
Another Approach – Brand New CMS Targets!
Go into the “program files” folder and then into the GSA-SER folder, and then the “engines” folder. Notice that the engine files are not usually stored in the same place as your verified lists, so remember that your GSA-SER files are stored in 2 different locations.
You can see when the engines were updated. So if I order the list by date modified (by clicking the “Date Modified” column header) I can see the newest ones:
It also says in the GSA-SER update/release notes that this CMS (MyUPB) was added recently.
The GSA-SER developer puts in new engines into the update notes. So it means you can actually see the new engines as they come in…
This means most GSA-SER users haven’t started targeting them yet! There is a window of opportunity when these new targets are released.
So I’ve decided to investigate MyUPB. I don’t have any verified URLs for this CMS. Lucky you, I’ll show you how to determine if a CMS is a potential winner!
Be Efficient! Avoid Crappy CMS
Bear in mind you don’t want to spend two days scraping for a single CMS only to find that only 10,000 sites use it! So there’s little bit of research involved.
How do we check to see if there are enough domains using this CMS?
Go back into the GSA-SER engine files (the same place as in the previous screen shot above), open it up to find the default scraping footprints:
See the bottom line with “search term=…”? GSA-SER holds default scraping footprints for all the engines. And they’re not usually a comprehensive list but they’re enough for us to get started with.
I’m going to copy one and go to Google and paste the search string in:
And there seems to be enough results to make this CMS worth investigating. Notice I used a typical kind of footprint “Powered by MyUPB”. These “powered by” footprints are rarely enough to judge a CMS by.
Many webmasters know to remove these obvious footprints, and also the search results will be full of pages of people talking about the footprint itself, and not just pages showing the footprint naturally on a site that actually uses that CMS.
So we’ll put another footprint together with the first one. This won’t expand your results, but it will refine them so we get less “noise”:
So there are some results (814,000). It’s not massive but there’s definitely enough here to be worth scraping for and spamming to!
Getting Close And Personal With A CMS
And I’ve had a look at what the site’s like as well and it looked like a normal forum:
You should always have a quick look at some real sites using the CMS you are researching, because you will see what sort of links you will get.
Will they be new blog posts? Or will they will links added to existing pages? This is an important distinction because it determines whether or not you can build PR links immediately, or if all your links will be on new PR0 pages.
Also with some CMS (particularly “blog post” or “new page” ones like Social Network and Wiki type targets) you want to know if you can spam it mercilessly, or if you have to be a bit more careful and make readable content and only create one account at the site and/or use proxies and “real” email addresses.
There are many CMS in GSA-SER, and they behave very differently, so the master GSA-SER spammer (that’s you and me!) will treat them differently! Don’t be lazy; do a little more work up front and it will multiply your returns.
So I’m going to continue with the MyUPB CMS! Phew, we finally have a good CMS to do this process with!
Let’s create the initial starter/seed links-list so we go on to Step 2 and analyze it for additional footprints. Creating our initial list from nothing involved scraping. I won’t cover it in full detail now because there is a comprehensive description in Step 4.
Short Version Of Step 4
Simply use the footprints from the GSA-SER engine files, and put them together to make sure you get good results that will be useful for Step 2.
With ScrapeBox, use the footprints in the keyword box. Keywords are not usually needed. Combine the footprints where it makes sense, and then make sure all the separate footprints are in quotation marks separately:
I’ve typed out the full footprints below:
“Powered by MyUPB” “PHP Outburst 2002 – 2016″ “You are not logged in. Please Register or Login”
“Powered by MyUPB v2.2.5″ “PHP Outburst 2002 – 2016″ “You are not logged in. Please Register or Login”
“Powered by MyUPB v2.2.6″ “PHP Outburst 2002 – 2016″ “You are not logged in. Please Register or Login”
“Powered by MyUPB v2.2.7″ “PHP Outburst 2002 – 2016″ “You are not logged in. Please Register or Login”
Again, you don’t need any keywords. It may be helpful to use private proxies for small scrapes like this if you can. I use free proxies for large scrapes, but for this initial list we only need 25-1000 unique domains for our footprint analysis. This initial scrape only needs to run for 10-15 minutes, at most.
Tech Note: Beware The Vertical Bar
If your footprints have vertical bars like “|” in them, cut those out because they will screw up your scraping. When we use Footprint Factory it will do this for us automatically so we won’t have this issue when we do the full scrape later. For example:
Home | Contact | Privacy (Bad footprint! Don’t use it!)
Becomes: “Home” “Contact” “Privacy”
Just trust me on this, avoid vertical bars even if those footprints seems tempting. They will just waste your time and resources!
Step 1 – Summary
So to summarize Step 1 in a nutshell you will either:
1. Use an existing verified links-list (for example a list you bought, or made with GSA) , but make sure it’s a CMS-specific list.
Or if you don’t have a CMS starter links-list to work with:
2. Choose a CMS from a list (like the available GSA-SER CMS engines) for research. Find out if there are enough targets. Once you’ve chosen a CMS you can either use an existing success/verified list, or create your own starter list for that CMS for Step 2.
Here is the decision tree image again, for any visual learners:
STEP 2 – IMPORT AND ANALYZE A “STARTING” OR “SEED” LINKS-LIST WITH FOOTPRINT FACTORY
Now we have our starting/seed URL list, we should do 2 things before importing into Footprint Factory:
If the list is old, check the URLs are alive and still working. ScrapeBox Alive-Check plug-in is great for this. Doing this will save you a lot of time later. Screen-shot below for the SB noobs!
After saving your list, remove duplicate domains (after alive check), because Footprint Factory works best with one URL from each domain. That makes sense right? If you have 100 URLs in your list, and 50 of them belong to one site, you’re not going to get accurate numbers for the Footprint Factory analysis that basically asks: “what % of 100 sites have this footprint?”
Using Footprint Factory
I’ll show you some key points but I won’t do a full tutorial of how to use Footprint Factory, it comes with a PDF that explains everything. The video series also shows this in detail, and there is an additional tutorial video series for Footprint Factory.
Overview – How Footprint Factory Will Rock The SEO World
Footprint Factory Pro has three modes and you can use them at the same time. So I’m going to use the Text Snippets and also the Process URLs modes.
The key thing is to know that Footprint Factory can analyze everything on the URLs you import. It breaks down everything on the page, and slices everything up, and then compares it to everything on all the other pages.
This means no stone is left unturned in your automated quest for footprints! Each piece of text Footprint Factory extracts is called a “snippet”.
If that snippet appears on many domains from your imported URL list, you can assume that the snippet is also a “footprint”. You can then export these footprints to find many more target URLs with the web scraper of your choice.
With the Pro version you can also extract URL patterns for filtering URL lists, and also scraping with “inurl:” search strings if you like to do that sort of thing!
General Default Options
On the right hand side you’ll see 4 options.
The reason we have the Min./Max. file size because sometimes you get pages that are maybe 1 or 2KB, and that probably means that it’s probably a 404 page that says something like “this is a 404 page, click here to go to homepage.”
Obviously that’s no use to us. We want a page that actually has some HTML on it. So that’s why we set the Min. file size to 10KB min. You can set a lower if you want but you’ve been warned!
I set Max. size to 200KB, and this is large because the file size doesn’t include any media that’s on the page. Images and videos are not included. 200KB is just the HTML and CSS and Javascript (and any text content) limit.
Mode Settings
Open Text Snippets mode settings, “Treat pipes as snippet separators” should be checked. I also drop the Max. snippet length to 200 characters. “Replace Numbers with *” is very useful for stacking up version number and date/time footprints. I suggest always leaving that checked!
For the URLs settings, I want to compare Path and File-name
What that means is that these URLs are going to be compared by their file path and the File-name, for example:
404 Not Found
https://www.example2.com/projects/wiki/members.html
Example 1:
File path: /wiki/
File-name: index.php
Example 2:
File path: /projects/wiki/
File-name: members.html
So those are the things we’re going to compare because we can use those as a “sieve” filter later when it comes to scraping even more targets. This process refines your results so you have more results that are only for your target CMS.
Import your URL list (after removing duplicate domains) into Footprint Factory.
Then click “Get Footprints”.
Filtering Footprints
Soon you’ll have a very large list of extracted and compared text snippets:
56,532 is far too many snippets, and many of them will be useless anyway. We have to briefly look through this list to see what we should be removing. The quickest way to reduce the footprint count is to remove everything below a certain frequency.
So, if you imported 100 URLs from unique domains, and there are snippets that only appear on 3 of the URLs (3%), then those are probably not good footprints.
We can select all snippets under any frequency using the frequency threshold:
Remember that Footprint Factory finds and analyzes every snippet on all your files, so you will need to filter the useless snippets.
Using Freq. Threshold will greatly reduce the number of snippets because the majority of snippets will only appear once or twice, and they are not footprints at all!
I like to use 30-80 of the best snippets to then use as footprints. In time you will find your own sweet spot. Comb through the rest of list by eye. If you’re using the free version of Footprint Factory then you should remove anything like:
search
contact
password
If you’re using the Pro version, then it does not matter so much because you can merge footprints together. The above snippets are useless on their own for scraping. They won’t help us find a specific CMS. But if you put them together using Footprint Factory Pro then you will be able to use more snippets as footprints as you are forcing them to work together as one footprint:
"search" "contact" "twitter" "Google" "password"
I’ll show you how to do this later in the guide.
If you edit your text snippets list, then make sure you save them by exporting the snippets to a text file. It can be imported again later.
Using URL Footprints
You don’t need to filter the URL footprints, unless you are using them for “inurl:” footprints but I don’t recommend doing that because it burns out scraping proxies very quickly.
I think it’s best to only use the URL footprints as a “sieve” filter. So, there is no harm in having a few bad filters in there, it won’t make much difference to your results in the end.
The only URL Footprint you should not save and export is the “/” (forward slash) on its own, because that would defeat the point of the exercise: Why would you filter for / when almost every URL contains it?
That’s like sorting people by the criteria “red blood: yes/no”. It doesn’t make sense!
So save your time, don’t spend it going through the URL footprint list. Just remove the forward slash and export the URL footprints so you can use them later.
STEP 3: DECIDE ON SCRAPING STRATEGY AND EXPORT FOOTPRINT FACTORY RESULTS
Once we have our footprint list refined, the third step is deciding on the footprint strategy. What does that mean?
There are different ways you can combine footprints and also the keywords before exporting your final footprint list in the web-scraper format.
Snippet/Footprint Lists VS Web Scraping Footprint Lists
Try not to get confused, we made a list of footprints in Step 2 but the actual footprint list (the end product of this process) is different. So far we have a footprint list that looks like this:
footprint 1
footprint 2
footprint 2
But we want to scrape search engines, so we need something more like this:
“footprint 1”
“footprint 2”
“footprint 3”
or
“footprint 1” keyword 1
“footprint 1” keyword 2
“footprint 2” keyword 1
“footprint 2” keyword 2
“footprint 3” keyword 1
“footprint 3” keyword 2
or
“footprint 1” “footprint 2” keyword 1
“footprint 1” “footprint 2” keyword 2
“footprint 1” “footprint 2” keyword 3
or any combination or expansion of the formats above.
How To Build Your Footprint List
When we’re using CMS footprints in search engine queries, it’s best to have them in quotation marks. Next thing we want to do is combine these footprints (Footprint Factory Pro).
Many of your footprints will be common words. Those may all appear on the CMS you’re targeting but obviously if we searched for “Twitter” or “contact” on their own it’s not going to help us very much. So we have to combine these together to make sure that we’re actually getting the right results when we scrape.
At the top right of the user interface (shown below) you have an estimation of how many footprints you are currently generating with your settings (5,995 in the image). This number will grow exponentially when using footprint permutations (same thing as combinations for our purposes).
You can set the min and max values. So if your min is “2” then your generated list will only have footprint combinations (every footprint, with every other footprint). None of the footprints would be added to your scraping list on their own.
Add keywords if you want to, but it’s not always necessary if the CMS is not that common, or you have a large number of footprints. It is useful if you’re targeting a specific language though!
Click “Generate Footprints” to finish and make sure you export the list and save your work.
Step 3 Summary
We’ve decided on a strategy and exported our results. Your strategy is determined by the following:
how many combinations you’re using
if you’re combining the keywords
what web scraper you’re using (SB only allows 1 million URLs)
how large your starter/seed list is (this affects size of snippet list)
how many domains use the target CMS (roughly)
Use your brain. If you’re targeting a huge CMS like WordPress or VBulletin, then you can have large footprint lists because there are many, many targets you need to find.
But for smaller CMS targets like some image comment galleries, where there’s only a small number of sites, maybe 20,000 – 30,000 different domains, then you can use a much smaller scraping footprint list.
It would be overkill if there’s only 10,000 sites for a CMS, and you’re trying to use 20,000 footprints!
Export your generated footprint list. Then you can import that final footprint list into any scraper you choose, and search for your new backlink targets!
STEP 4 – USE YOUR FOOTPRINT LIST WITH YOUR WEB SCRAPER
I usually use HRefer (it’s not for everyone, it costs $650) but I will also show you how to use ScrapeBox ($57).
Using ScrapeBox
The first thing you need to do is get some proxies (see resources if you need help). Secondly, you can put your keywords or your footprint lists in the ScrapeBox “keyword” box, not both!
Remember your footprint strategy from earlier. If you have a huge (think 10,000+) footprint list then you may not want to use keywords at all, just use the “Import” button and select your footprint list file.
But if you have a smaller footprint list, or you have decided to use keywords then don’t import your footprint list. Instead, import your keywords:
Try to add some keywords for different languages. Use two or three words in the most popular languages, such as English, Spanish, German, Russian, French and Mandarin (Chinese).
Make sure you’re happy with your connection settings (Settings → Adjust Maximum Connections), then start harvesting URLs!
After ScrapeBox has finished scraping you can filter your results. First Remove/Filter → Remove Duplicate URLs, and then with that “sieve” filter I’ve mentioned a few times. If you have a URL footprint list, you can use it as a sieve filter with ScrapeBox.
Important Note:
The URL footprint list will be in .csv format because paths and filenames are stored separately. But if you’ve ever used spreadsheet software then it’s very easy to edit and then save as a text (.txt) file for the next step.
Click Remove/Filter → Remove URLs Not Containing Entries From… and select your URL footprint file (it should contain all the paths that you got from FPF, but with the single forward slash removed).
You now have your raw URL list for testing with your link-building software!
Using HRefer
And for HRefer users, your text footprints are your “additive words”, and your URL footprints are your “sieve filter”.
Don’t load additive words into the program itself, use the config files. Go into your HRefer folder, then the “templates” folder.
You can see I’ve made two new files. One is the name of the platform (sieve/URL footprints). And the other “CMS_addwords” is the additive words. Make sure the seconds file has the same name as the first, but with an underscore “_” and then “addwords”.
So those are the two files you need for HRefer. I’ll assume you know how to use proxies with this program. You’ll need to reload the program after changing any config files.
When HRefer is reloaded, select the engine config files by choosing it from the drop-down menu on the “Search Engines options and Filter” tab.
STEP 5 – RUN YOUR SCRAPED LIST TO TEST FOR SUCCESSES
This seems like a pretty obvious last step but it is important how you to do this because that’s how you know how successful your scrape was. So if you get the last bit wrong, you may think that you had poor results, when in actual fact you did everything correctly until the last stage.
This tutorial has been based around GSA-SER because I think is the most versatile link building tool there is. You have to be a bit more clever about the strategy these days but it definitely still works. And as I showed you before, what’s so great about this is that there’s so many different CMS already programmed in. You can just import a list and then run it.
Importing Your URLs Into GSA-SER (2 options)
After doing steps 1-4 you should have a huge URL list. Right-click the GSA-SER project you will be using and then import target URLs and then from file. And then select the file containing all the URLs you’ve scraped.
An prompt will appear and ask you if you want to randomize the list. I highly suggest you do that.
For reference there is another way of importing URLs by using the Advanced Options and then Tools → Import and Identify into Platforms.
But don’t do this for your new raw list. I’ve tested both importing methods side-by-side and the first one is much better. Importing target URLs on the project level and then running the project until completion is the fastest way to get your links working for you and to verify them.
Important Project Settings
There are a couple of options you should be aware of. When you do a large scrape make sure you have the “continuously try to post to a site even if it’s failed before” option checked.
Now that might sound like a bad thing because it sounds like you are trying to hammer a website over and over again. But what this is actually doing is overriding the “already parsed” message you would see in the reporting window here.
It can be quite annoying when you have a huge list and GSA-SER is stopping you from trying to post the same domain more than once. We’re usually looking for verified URLs to post to, not just domains so it’s important we use this option and test all URLs in our raw lists.
There is another option that allows postings to same sites again (allow multiple accounts). Ramp this up as well.
When running a raw list make sure you have plenty of email addresses loaded, and alsomake sure that you “check all engines”. Some of your raw list may be recognized as a different CMS, so why waste those links? Try for all of them!
Other Tips
It should go without saying that if you’re testing a raw list you should not send it to a money site directly. Not a good idea.
You may end up with tens of thousands of direct links. Also, if you’re using just one project for GSA-SER to test a raw list, then the chances are that the amount of content you have loaded into the project is not enough to make 100,000 unique submissions.
By default most content tools, like Content Foundry and Kontent Machine only allow you to submit a couple of thousand unique submissions (Content Foundry can be tweaked to do much more though).
Never test a raw list on a money site. Even using your raw list as a tier two project, you want make sure that the tier 1 it’s linking to is fairly large because heavy-loaded link pyramids are not as reliable as they used to be.
Know The Numbers To Gauge Your ROI
I like to go to the GSA-SER Tools menu (Advanced Options tab) and record my stats both before and after running a large raw list. It also helps to remove your duplicate URLs from GSA-SER every once in a while so you can get a true indication of how many unique verified URLs you have.
When you are happy with your other settings and you have your GSA-SER project content all loaded, run the project!
TO INFINITY AND BEYOND!
Now you know the process, you can go back into your GSA-SER verified links folder and just start picking off these different CMS one by one. You can work your way through all of them if you want!
Run them through Footprint Factory, build your footprint (text and URL) lists, export that into your scraper and repeat to build enormous links-lists.
Footprint Factory will save you hours of time and allow you to easily combine footprints. It will also reveal footprints that you would never find manually if you were checking individual sites and cutting text snippets out by hand.
This guide is quite dense, but the process is mainly automated. This means once you understand what you are doing the only part that takes more than 5 minutes is researching a CMS to target.
GSA-SER and Footprint Factory really do take all the guess-work out of it for you. And also making these comprehensive footprint lists actually gets you much better results that can result in 8% verified rates from a raw list.
Don’t wait! Feed these programs some data and set it to run so you can start ranking and banking.
Cheers!
0 comments :