With these settings, you can:
- Index external websites within Aspen
- Pull external website information into your catalog search results
Enable Website Indexing
Go to Aspen Administration > System Administration > Modules > Web Indexer > check Enabled > Save.
Once enabled, a Website Indexing section will appear in the Aspen Administration. If not, you'll want to check your permissions.
In Aspen Administration > System Administration > Permissions, select the Role you'd like to check in the Role to Edit dropdown, then scroll to the Website Indexing section to make sure the role has the following permission:
Administer Website Indexing Settings - Allows the user to administer the indexing of websites for all libraries.
Select the Website
Select a website you want to index in Aspen.
- Is it your library website?
- A trusted website you are always recommending?
- A county, city, or community website?
What types of pages from the site do you want to display?
Before you get started, is there a specific page on the website that you are interested in indexing or are you interested in indexing the entire website?
You might just want one page on the website (example: Your County Parks and Recreation web page) instead of every page of the website (example: Your entire County's website).
Depending on what results in Aspen you want to show, you can set up the configurations for these specific situations.
These settings are found in Aspen Administration > Website Indexing > Settings.
Click Add New to add a new website to index.
Configure Website Indexing Settings
Name - Give your Website Index a Name. This name will show up in the search result facets under Site Name.
Search Category - Give your Website Index a Search Category name. This name will show up in the search result facets under Website Type.
Site URL - The URL you want Aspen to index. Aspen supports both standard URLs and sitemap URLs (.xml).
When you add a Site URL, Aspen will crawl the full text of that page, find links on that page that match the main site URL, and crawl those links and repeat the process on those subsequent pages.
In the screenshot above, we've entered in the main website URL - www.pueblolibrary.org. This will index ALL pages on the website unless you set Paths to Exclude.
When indexing your library website, consider if you want to pull in ALL pages on your website or if you want to target a few specific pages. For example, instead of everything, you might only want to index important pages such as pueblolibrary.org/services, /bookmobile, /research, /teen, /children, /accessibility, etc.
This is especially important if you have a number of outdated events and blog posts on your library website that you don't want to appear in Aspen.
Regular Expression to find Page Title (ok to leave blank) - Fill this out if you want to pull specific title information from the metadata. If left blank, Aspen will crawl the site and do its best to find the title.
Example: <meta name="title" content="(.*?)"/>
Regular Expression to find Description (ok to leave blank) - Fill this out if you want to pull specific description information from the metadata. If left blank, Aspen will crawl the site and do its best to find the description.
Example: <meta name="description" content="(.*?)"/>
Paths to Exclude (regular expression) - Add paths to exclude if you want to prevent Aspen from indexing certain areas of your site.
Example (exclude specific sections):
Example (exclude a specific page):
Maximum Pages to Index - The maximum amount of pages Aspen will index results from. We recommend the maximum be 2500.
Crawl Delay - The number of seconds to delay between requesting pages from the site. 10 is typically the default.
Frequency to Fetch - How often you want Aspen to re-crawl the site for changes. Think about how often content might realistically change on your site.
You can choose from:
Last Fetched (clear to force a new fetch) - This will show the last date and time that Aspen fetched information from this website. Clearing this field will force a re-index the site, even if it is before the Frequency to Fetch that you have set.
Choose Libraries and Locations - Like many other settings in Aspen, you'll want to specify the libraries and locations this website index corresponds to.
Updated 2023-01-18 - md bws
There is no definitive way to delete individual pages after being indexed or prevent them from being indexed.
However, you can use Paths to Exclude in Website Indexing > Settings to try to avoid specific pages. To use this, enter in the directory and/or the specific URL to be excluded.
Note: In certain instances, the Aspen support team can manually delete indexed pages from the server. Please put in a support ticket if you need assistance with this.
Under Aspen Administration > Website Indexing > Website Pages you will see a log of all the website pages that Aspen has attempted to index.
The Website Pages section is a log only. The "Deleted?" box will be checked if Aspen has removed that page from the indexer. Checking or unchecking "Deleted?" will not force a deletion of that page or restore a deleted page.
Aspen Administration > Website Indexing > Indexing Log will show you when Aspen has indexed the website settings.
You can see how many total pages Aspen found, any errors, how many pages Aspen added to the index, any pages added since the last index, any pages deleted (based on the Paths to Exclude), and any pages that had updates.
In Aspen Administration > Website Indexing > Dashboard you can see statistics related to the activity of the indexed pages within Aspen.
For each web page you have added in Website Indexing > Settings you will see stats for:
- Pages Viewed
- Pages Visited
- Active Users