What is OutWit Hub and when should I use it?
When you are looking for something on the Web, search engines give you lists of links to the answers. The purpose of OutWit Hub is to actually go retrieve the answers for you and save them on your disk as data files, Excel tables, lists of email addresses, collections of documents, images…
If your question has one simple answer, it will be at the top of Wikipedia or Google results and you don’t need OutWit for that. When you know, however, that it would take you 20, 50, 500 clicks to get what you want, then odds are you do need OutWit Hub:
The Hub is an all-in-one application for extracting and organizing data, images, documents from online sources. It offers a wealth of data recognition, autonomous exploration, extraction and export features to simplify Web research. OutWit Hub exists both as a Firefox Add-on and as a standalone application for Windows, Mac OS, and Linux.
OK, I have downloaded OutWit Hub and I am running it. Now what?
We have an open list of 1,728 first things you can do with the application but we believe the best first thing is to run the built-in tutorials from the Help menu (Help>Tutorials).
I want OutWit Hub to browse through a series of result pages but the ‘Next in Series’ and ‘Browse’ buttons are disabled. How come?
When opening a Web page, OutWit analyzes the source code and tries to understand as many things as possible about the page. The first thing it does is to find navigation links (next, previous…) and, when it does, the ‘Next in Series’ arrow and ‘Browse’ double arrows become active. If they are inactive, it is because OutWit did not find any additional pages. There are many workarounds to do the scrape without having to click on all links manually: depending on the cases, the best alternative solutions are using the Dig function (with advanced settings in the pro version), generating the URLs to explore, making a ‘self-navigating’ scraper with the #nextPage# directive or, finally, grab the URLs you want to scrape, put them in a directory of queries and use this directory to do a new automatic exploration. (Note that for the latter, it is also possible to grab the links to the catch in one macro and address the column of the Catch by the name you gave it in a second macro, by typing ‘catch/yourColumnName’ in the Start Page textbox.)
Some links are not working in the Standalone version of the Hub. What should I do?
These are links for which target=blank was specified in the source code. OutWit Hub cannot open separate popup windows but you can open them within the Hub. For this, check the “Open popup links in the application window” preference (Tools>Preferences>General).
Auto-Explore Functions and Fast Scraping are slower in the current version than in the previous. Why is that?
They are not, in fact. The program’s exploration functions work exactly the same way. It is possible, though, that your preference settings have changed during the upgrade. Temporization and pause settings should actually be more precise and reliable than in previous versions. You can fine-tune all this in Tools>Preferences>Time Settings. Another recent preference which may have an impact on the exploration speed is ‘Bypass Browser Cache’ in the ‘Advanced’ panel: not using the cache does slow the browsing down, so you may want to set it to ‘Never’. If, after this, you are still experiencing performance issues, consider disabling processes you may not need by right-clicking on ‘page’ in the left side bar.
The next page button functions correctly but when trying to do a Browse to capture the information, the application runs only 2 pages then stops. Why is that?
- Solution: there is a preference (Tools>Preferences>General) just for this. Uncheck “Only visit pages once…”. Important: Do not forget to check it back afterwards or your next Dig would probably last forever and bring back huge amounts of redundant data.
Data: Extracting, Importing, Exporting…
I would like to extract the details of all the products/events/companies in this site/directory/list of subsidiaries… Could you please advise me on how to do that?
Unfortunately this is the purpose of the hundreds of features covered in the present Help, so it is difficult to answer in one sentence, but the general principle is this:
Go through the standard extractors (documents, lists, tables, guess…) by clicking in the left side panel. Either you find that one of them gives you the results you want, –in which case it is just a matter of exporting the data– or you need to create a scraper for that site. In the second case, you first need to go to one of the detail pages, build a scraper in the ‘scrapers’ view for that page, test it on a few other pages. Then go to the list of results you need to grab and have OutWit browse through all the links and apply your new scraper. This can be done in two ways: either by actually going to each page (‘browse’ or ‘dig’ or a combination of both if you have the pro version) or by ‘Fast Scraping’ them (applying your scraper to selected URLs –right-click: Auto-Explore>Fast Scrape in any datasheet– or ‘Fast Scrape’ in a macro).
How can I import lists of links (URLs) or other strings into OutWit Hub?
There are many different ways to do this. Here are a few:
Put them into a text file (.txt or .csv), and open the file from the File menu. (Note that on some systems, the program may try to open .csv files with another application. In this case, just rename your file with the .txt extension.) You will find your URLs in the links view and the text in the text view.
Drag them directly from another application to the page or queries view of the Hub.
If they are in a local HTML file, simply open the file from the File menu and you will be able to process it with the Hub as any Web page.
Copy the links from whatever application they are in (you can also copy HTML source code or simple text containing URLs), right-click in the page view of the Hub and choose Edit>Paste Links.
Once your links are in the Hub, you simply need to select them, right-click on one of them and select ‘Send to Queries’ to create a directory of URLs that you will then be able to use in any way you like (in a macro for instance, or doing an automatic exploration directly from the right-click menu).
How can I import CSV or other tabulated data into OutWit Hub?
Simply open the file (.txt, .csv …) from the File menu. (Note that on some systems, the program may try to open .csv files with another application. In this case, just rename your file with the .txt extension.) If the original data was correctly tabulated, you should find the data well structured in the guess view. If the data was less structured, well, the Hub will do what it can.
I have made a scraper which works fine on the page I want to scrape, but when I do a browse and set the ‘scraped’ view to collect the data, it grabs the data of the first page over and over again. What is happening?
How can I convert a list of values into a String Generation Pattern?
If the values are in one of the Hub’s datasheets, just select them, right-click on one of them and select “Insert Rows…”. If they are in a file on your hard disk, simply import them into a directory of queries (see above) and do the same.
What is the maximum number of rows of data OutWit Hub can extract and export? After a certain number of rows, when exporting, I get a dialog telling me a script is unresponsive. What should I do?
In our tests, we have extracted and successfully exported up to 1.3 million rows (of two or three columns). Obviously, the limit varies a lot from system to system, depending on the platform, the RAM, etc. When exporting more than 50,000 or 100,000 rows, you may see such dialogs, even several times in a row, when you click on Continue. There is a checkbox to prevent it from coming back. (Note that Excel XML export is always much more demanding than CSV or TXT.) Don’t forget that you can move your results to the catch and save the catch itself in a file if you need to reuse the contents or just for backup purposes (File Menu). A catch file can only be read again in OutWit Hub but it is much faster to save than exporting the data.
The program doesn’t find all the email addresses in this Website, Why is that?
There are several ways to have OutWit look for emails in a site. The fastest is to select Fast-Search For emails>In Current Domain, either from the Navigation menu or from the popup menu you get when you right-click on the page. This method, however, doesn’t explore all pages in the site. It only looks for the most obvious (contacts, team, about us…) pages that can be found. If you want to systematically explore all pages in a site, you will have to use the Dig function, within domain, at the depth level you wish.
Why doesn’t the program find contact information (phone, address…) for some of the email addresses?
First, of course, the info has to be present in the page. Then, if it is there, no technology allows for perfect semantic recognition. An address or a phone number can take so many different forms, depending on the country, on the way it is presented or on how words are abbreviated, that we can never expect to reach a 100% success rate.
Email address recognition is nearly exhaustive in OutWit; phone numbers are recognized rather well in general; physical addresses are more of a challenge: they are better recognized for US, Canada, Australia and European countries than for the rest of the world. The program recognizes names in many cases. As for other fields like the title, for instance, automatic recognition in unstructured data is too complex at this point and results would not be reliable enough for us to include them unless they are clearly labled. We are constantly improving our algorithms so you should make sure to keep your application up-to-date.
I am observing the progress and I see that no new line is added for some pages when I am sure there is an email address or other info that should be found. Why is that?
This page (or one containing similar info) was probably visited before. Results are automatically deduplicated. This means that if an email address –or just a phone number or physical address– has already been found, the row containing this data will be updated (and no new row, created) when a new occurrence is found.
How do I make a hidden column visible in a datasheet?
In the top right corner of every datasheet in the application is a little icon figuring a table with its header: the Column Picker. If you click on this icon, a popup menu will allow you to hide or show the different columns of the datasheet. Only visible columns are moved to the Catch and exported by default (this behavior can be changed with a custom export layout).
What is the Ordinal ID?
The Ordinal column is hidden by default in all datasheets. Use the column picker (icon at the top right corner of any datasheet) to display it. The Ordinal ID is an index composed of three groups of digits separated by dots. The first number is the number of the page from which the data line was extracted (it can only be higher than 1 if the ‘empty’ checkbox is unchecked or if the datasheet is the result of a fast scrape). The second number is the position of the data block in the page (can only be more than 1 in ‘tables’, ‘lists’, ‘scraped’ and ‘news’ views). The last number is the position of the data line in the block (or in the page, if there is only one data block in the page).
I do not manage to enter my serial number in the Registration Dialog of OutWit Hub. The program keeps saying the key is invalid.
Your key was sent to you by email when you purchased the application. It is a series of letters and digits similar to this: 6YT3X-IU6TR-9V45E-AFS43-89U64. It must not be confused with the login password to your account on outwit.com which was also sent to you by email (if you miss one of these email messages, please check your spam box).
If you are wondering whether the Hub you are using is a pro or a light version, you will simply find the answer in the window title. Up to now, we haven’t had a single case where a valid serial number would not work. You might be experiencing a very rare bug but this seems very unlikely after several years. The key needs to be entered exactly like it is in the mail you received. So, either you are not typing it precisely right (in which case you should simply copy and paste the email address and the key from our original mail) or you are typing something completely different (the login to your outwit.com account, for instance?). If you have changed email adresses since you purchased your license, remember that the one to use is the one with which you originally placed your order.
I have installed OutWit Hub for Firefox (or Docs or Images) then reloaded Firefox but I don’t see the OutWit icon on my toolbar. What can I do?
1) You didn’t download the Firefox add-on but the standalone application. In this case, you just need to install the software and double-click on its icon, as you would for any other application.
2) You do have the add-on and the install worked but the icon is simply missing from the toolbar. In this case, select ‘OutWit’ in the ‘Tools’ menu, then select OutWit Hub (or the appropriate outfit) in the sub-menu. If you want to add the icon to your toolbar, right-click on the toolbar and select ‘Customize’ then drag and drop the OutWit icon onto it.
3) The add-on install failed. In this case, the most probable reason is that, even though you just downloaded the program, you do not have the latest version. The one you downloaded (probably from a third party) doesn’t work with the current version of Firefox. Download the latest version from outwit.com. Of course, every now and then, it may also be a real compatibility problem. So if the above doesn’t work or doesn’t apply, please create a support ticket on outwit.com and we’ll do our best to help.
How can I revert to OutWit Hub 3.0?
If you have upgraded to version 4 by mistake or have a problem with a feature and wish to revert to version 3, make sure your version (Hub and runner) is 188.8.131.52 or higher and type outwit:downgrade in the Hub’s address bar. (Please tell us if you believe you have discovered a problem in this version.)
On OutWit Hub For Firefox, I have been experiencing new issues recently: unresponsive scripts, timeouts, strange behaviors on pages that used to work fine… what can I do to revert to factory settings?
We are not aware of incompatibilities with other add-ons but it can always happen, some of your Frefox preferences could also have been changed by another extension or files may have been corrupted in your profile. You can try to create a blank profile and reinstall OutWit Hub (or other OutWit extensions) from outwit.com. This will bring you back to the initial state. Here is how to proceed on Windows:
and on other platforms:
Can I create a new profile in OutWit Hub Standalone?
With the standalone version, the principle is almost exactly identical to the way it works in Firefox (see above paragraph).
Windows: click “Start” > “Run”, and type :
“C:\Program Files (x86)\OutWit\OutWit Hub\outwit-hub.exe” -no-remote -ProfileManager
Macintosh: Run the Terminal application and type :
/Applications/OutWit\ Hub.app/Contents/MacOS/outwit-hub -no-remote -ProfileManager
Linux: open a terminal and type :
[path to directory]/outwit-hub -no-remote -ProfileManager
If you need instructions to go further, refer to the profile manager instructions for Firefox:
Where is my profile directory?
In OutWit Hub (Standalone or Firefox Add-on), if you type about:support in the address bar, you will get a page with important information about your system and configuration. In this page, you will find a button that will lead you to your profile directory. Among the files you will see there, the ones with .owc extensions are Catch files, and files ending with .owg are User Gear files (the User Gear is the database where all your automators are stored). You can back these files up or rename them if you plan to alter your profile.