Search Extractors allow you to add useful indexes to Confluence?s search in order to find pages that meet a specific criteria. Some examples are:

Read more about extractors in the Confluence Extractor Module documentation.

Custom Search Extractors

To start, let?s create a simple search extractor. The extractor will return all indexed pages for the search term homePage. To create a custom search extractor go to Admin ? Search Extractors, click on the Custom search extractor link → Expand examples and select Home Pages Extractor:

custom extractor

Click Add to save the search extractor.

Now choose the search field at the top-right, type homePage : true and search.

extractor search field

The search result returns the list of the home pages for all spaces.

extractor home page search result
If you want the search result to return all the pages that exist in Conflunce, including those created before the extractor, then the Confluence search index has to be rebuilt. For more information on rebuilding the index, read up on Content Index Administration.
Rebuilding the search index is a time consuming, expensive operation and should not be triggered in busy hours.

Binding Variables in Extractor

There are three binding variables available in extractor script. These are:

  • document : The Lucene document that will be added to the search index for the object that is being saved

  • defaultSearchableText : The main body of text associated with this object in the search index

  • searchable : The object that is being saved, and passed through the extractor chain

Example Extractors

All the examples are available under Admin ? Search Extractors → Custom search extractor → Expand examples section.

Search Page By Year Extractor

The extractor helps to return all the pages created in a year.

import com.atlassian.confluence.pages.Page
import org.apache.lucene.document.Field
import org.apache.lucene.document.StringField

if (searchable instanceof Page) {
    Page page = searchable as Page
    Calendar calendar = Calendar.getInstance() (1)
    calendar.setTime(page.getCreationDate()) (2)
    String pageYear = calendar.get(Calendar.YEAR) as String (3)
    document.add(new StringField("year", pageYear, Field.Store.YES)) (4)
}
1 - Create an instance of Calendar.
2 - Set the time of our calendar to match the page’s creation date.
3 - Get the year the page was created.
4 - Store the year as a field in the Lucene document.

The following screen shot shows an example search result for year : 2017:

extractor search

Pages With Label Extractor

This extractor returns all the pages that contain the "finance" Label

import com.atlassian.confluence.labels.Label
import com.atlassian.confluence.pages.Page
import org.apache.lucene.document.Field
import org.apache.lucene.document.StringField

if (searchable instanceof Page) {
    Page page = (Page) searchable
    String labelText = "finance"
    Label myLabel = new Label(labelText) (1)
    List<Label> labels = page.getLabels()
    if (labels.contains(myLabel)) {
        document.add(new StringField("label", labelText, Field.Store.YES)) (2)
    }
}
1 - Create a new Label "finance".
2 - if the page has the "finance" label, store that as a field in the Lucene document.

The search string for this extractor is label : finance.

Pages With Attachments Size Extractor

This extractor helps to search all the pages with attachment more than 20 meg in size.

import com.atlassian.confluence.pages.Attachment
import com.atlassian.confluence.pages.Page
import org.apache.lucene.document.Field
import org.apache.lucene.document.StringField

if (searchable instanceof Page) {
    Page page = searchable as Page
    if (page.getAttachments()) {
        long twenty_meg = 20 * 1024 * 1024 (1)
        long fileSize = page.getAttachments().sum { Attachment attachment -> attachment.getFileSize() } as long (2)

        if (fileSize && fileSize > twenty_meg) {
            document.add(new StringField("attachment", "20", Field.Store.YES)) (3)
        }
    }
}
1 - Calculate 20 megabytes as bytes.
2 - Get all attachments for a page and get total size in bytes.
3 - If the total attachment size is large enough, store attachment with value 20 for the page.

The search string for this extractor is attachment : 20.

Page Last Modified By Extractor

This extractor finds all the pages that were last modified by a specific user.

import com.atlassian.confluence.pages.Page
import org.apache.lucene.document.Field
import org.apache.lucene.document.StringField

if (searchable instanceof Page) {
    Page page = searchable as Page
    String name = page.getLastModifier().getName() (1)
    document.add(new StringField("modifier", name, Field.Store.YES)) (2)
}
1 - Get the name of the last modifier.
2 - Store the modifier field with user name as it’s value.

Use the Confluence username to do a search. For an example if user name is "rfranco" then the search string will be modifier : rfranco.

For how-to questions please ask on Atlassian Answers where there is a very active community. Adaptavist staff are also likely to respond there.

Ask a question about ScriptRunner for JIRA, for for Bitbucket Server, or for Confluence.