doc: server config: basic usage of robots.txt/HTML robots meta-tag/crawler control...

[github/shaarli/Shaarli.git] / doc / md / Server-configuration.md
diff --git a/doc/md/Server-configuration.md b/doc/md/Server-configuration.md

index ca82b2ec7cd63c558717cee9ffb430a8459d4578..cf44ecf5924099ee78405fee4ca4b1bf3e322c8e 100644 (file)
--- a/doc/md/Server-configuration.md
+++ b/doc/md/Server-configuration.md
@@ -29,7 +29,7 @@ Extension | Required? | Usage
  ---|:---:|---
  [`openssl`](http://php.net/manual/en/book.openssl.php) | All | OpenSSL, HTTPS
  [`php-mbstring`](http://php.net/manual/en/book.mbstring.php) | CentOS, Fedora, RHEL, Windows, some hosting providers | multibyte (Unicode) string support
-[`php-gd`](http://php.net/manual/en/book.image.php) | optional | thumbnail resizing
+[`php-gd`](http://php.net/manual/en/book.image.php) | optional | required to use thumbnails
  [`php-intl`](http://php.net/manual/en/book.intl.php) | optional | localized text sorting (e.g. `e->è->f`)
  [`php-curl`](http://php.net/manual/en/book.curl.php) | optional | using cURL for fetching webpages and thumbnails in a more robust way
  [`php-gettext`](http://php.net/manual/en/book.gettext.php) | optional | Use the translation system in gettext mode (faster)
@@ -397,6 +397,7 @@ http {
  ```
  
  ## Proxies
+
  If Shaarli is served behind a proxy (i.e. there is a proxy server between clients and the web server hosting Shaarli), please refer to the proxy server documentation for proper configuration. In particular, you have to ensure that the following server variables are properly set:
  
  - `X-Forwarded-Proto`
@@ -405,6 +406,12 @@ If Shaarli is served behind a proxy (i.e. there is a proxy server between client
  
  See also [proxy-related](https://github.com/shaarli/Shaarli/issues?utf8=%E2%9C%93&q=label%3Aproxy+) issues.
  
+## Robots and crawlers
+
+Shaarli disallows indexing and crawling of your local documentation pages by search engines, using `<meta name="robots">` HTML tags.
+Your Shaarli instance and other pages you host may still be indexed by various robots on the public Internet.
+You may want to setup a robots.txt file or other crawler control mechanism on your server.
+See [[1]](https://en.wikipedia.org/wiki/Robots_exclusion_standard), [[2]](https://support.google.com/webmasters/answer/6062608?hl=en) and [[3]](https://developers.google.com/search/reference/robots_meta_tag)
  
  ## See also