Searching PHP Sites With SharePoint 2010

 

In this article, you will learn how to search PHP sites with SharePoint 2010 search engine. Also check out our recent articles about SharePoint search:

Working With SharePoint 2010 Search - Part 1

Working With SharePoint 2010 Search - Part 2

As of this writing (Dec 2011), PHP is the 6th most popular programming language (after Java, C, C++, C#, and Objective-C) (*1). In Dec 2010, it was at 4th position. I consider it to be a most popular web programming language. Till Jul 2007, 20,917,850 domains and 1,224,183 IP addresses had been registered for PHP sites (*2). Searching sites created with such a popular technology with one of the world's best search technologies really makes sense. I know this may not be a practical approach but the solution exists if someone is interested.

Here are the steps:

1. Open SharePoint Central Admin site.

2. Click General Application Settings.

3. Click Farm Search Administration.

4. Click Search Service Application.

5. Click Content Sources under Crawling in the left side menu.

6. Click New Content Source.

7. In Name field, enter PHP.

8. In Content Source Type, select Web Sites.

9. In Start Addresses, enter the web address of a PHP site. You can search both external and local PHP sites. For example, my local PHP server address is http://127.0.0.1:8080 and the site that I wanted to search is home. So, I will enter http://127.0.0.1:8080/home in the Start Addresses field. You can also enter server address like http://myserver/mysite. If I wanted to search an Internet site, I would enter the site URL, for example, http://www.walisystemsinc.com. I will show you results from both local and Internet PHP sites at the end of this article.

10. In Crawl Settings, you have three options:

i) Only crawl within the server of each start address
ii) Only crawl  the first page of each start address.
iii) Custom - specify page depth and server hops. (There are other properties associated with this setting)

You can choose the option that suits your needs. Selecting the first option will allow you to search the whole server starting at the start address you provided. Choose first option.

11. In Crawl Schedules, you can schedule a crawl. You can schedule both Full Crawl and Incremental Crawl. Click Create schedule link under Full Crawl drop down.

Figure 1:  Manage Crawl Schedules

12. It's up to you if you want to schedule a crawl or not. You can do it manually whenever you want. If you want to schedule it, select the type and time and click OK.

13. You can select priority of this content source in Content Source Priority. By default, it is set to Normal. You can change it to High.

14. If you want to start the crawl now, check Start full crawl of this content source and click OK.

15. Click File Types under Crawling. Make sure PHP is listed as one the file name extensions. If it's not there, click New File Type link at the top and enter php in the File extension field. Click OK.

16. Next, create a new scope for PHP search. Click Scopes under heading Queries and Results.

17. Click New Scope.

18. Enter PHP Search in Title. Enter Search PHP Sites in Description. In Target Results Page, you have two options. You can use the default Search Results Page or you can create a new page to search PHP sites. If you want to learn how to create these pages, check out following article:

Working With SharePoint 2010 Search - Part 1

For now, select the first option and click OK.

19. After you have added the scope, you will see a Add rules link under Update Status column against the newly added scope. Click this link.

20. First property you will see is Scope Rule Type. There are four options listed.

i) Web Address: This is the web address of the site that you want to search. You can specify a specific folder below. Alternatively, you can specify a hostname or domain or subdomain. 
ii) Property Query: This option allows you to add restrictions. For example, you can select Author from the Add property restrictions drop down and then specify a value and now all results authored by this author will not be shown.
iii) Content Source: This allows you to select a content source.
iv) All Content: This will include everything in this scope.

For the first three options above, you can select a behavior. There are three options available:

i) Include: Any item that matches this rule will be included.
ii) Require: Every item in the scope must match this rule.
iii) Exclude: Items matching this rule will be excluded from the scope.

In Scope Rule Type, select Content Source. In Content Source, select the scope that you created above, that is, PHP Search. In Behavior, select the first option Include. Click OK.

Note, you can add more rules if you like. After you have added one rule, you can click New rule link on the Scope Properties and Rules page to add more rules.

21. You are now ready to test the PHP searching. Before that, make sure the content has been crawled otherwise no resutls will appear in the search. Go back to the Search Administration page. Locate Scopes needing update property. Does it show a number greater than 0? If it is a non-zero number then update is pending. Click Start update now link to start crawling manually. Depending on the size of the site (number of pages), it may take anywhere between 3 minutes to hours to crawl the content. Once the crawl has finished, you can test the PHP search in SharePoint site.

Figure 2: Start crawl manually

22. Open SharePoint Seach Center Site. Click Site Actions and select New Page.

23. In New page name, enter phpsearch.aspx and click Create. A new page called phpsearch.aspx will be created and stored in the Pages library. After you click Create, you will be taken to the new page phpsearch.aspx.

24. Click Add New Tab.

25. Enter PHP Search in Tab Name.

26. In Page, enter phpsearch.aspx. This is the page you created above. In Tooltip, enter Search PHP Sites. Click Save button.

26. Click Publish tab at the top and then click Publish button to publish your changes. Enter Comments and click Continue. As soon as you click the Continue button, a new tab will appear on the screen.

Figure 3: PHP Search Tab

27. Enter any word that you think will be found in PHP sites on your server and click the Search (lens) button. The most common word is PHP so for the sake of testing, I would suggest enter PHP and search. You will see results from your PHP Sites.

Figure 4: PHP Search Results in SharePoint 2010 (from local PHP server)

As promised above, screenshot below shows search results from an external (Internet) PHP site.

\

Figure 5: PHP Search Results in SharePoint 2010 (from live Internet site)

That's it. You have configured your SharePoint 2010 server to search PHP sites.

 

(*1) http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html
(*2) http://php.net/usage.php