# Forming URLs

By following these guidelines, you can build URLs for Baidu, Yandex, or Wayfair for your web scraping tasks.

## **Baidu**

Job parameter assignment to URL:

{% code overflow="wrap" %}

```
https://<subdomain>.baidu.<domain>/s?ie=utf-8&wd=<query>&rn=<limit>&pn=<calculated_start_page>
```

{% endcode %}

When forming URLs, please follow these instructions:

1. **Encoding search terms**: Search terms must be URL-encoded. For instance, spaces should be replaced with `%20`, which represents a space character in a URL.
2. **Calculating start page**: The `start_page` parameter now corresponds to the number of search results to skip. Use the equation `limit*start_page-limit` to calculate the value.
3. **Subdomain assignment**: The subdomain value depends on the user agent type provided in the job. If the user agent type contains mobile, the subdomain value should be `m`. Otherwise, it should be `www`.
4. **Query parameter**: Depending on the subdomain value (`m` or `www`), the query parameter for the query term should be adjusted accordingly (`word` for `m` and `wd` for `www`).

#### Sample Built URLs

For mobile:

```
https://m.baidu.com/s?ie=utf-8&word=test&rn=10&pn=20
```

For desktop:

```
https://www.baidu.cn/s?ie=utf-8&wd=test%20query&rn=13
```

#### Equivalent Job Examples

Decommissioned `baidu_search` source:

```json
{
    "source": "baidu_search",
    "query": "test",
    "domain": "com",
    "limit": 5,
    "start_page": 3,
    "user_agent_type": "desktop"
}
```

Updated `universal` source:

```json
{
    "source": "universal",
    "url": "https://www.baidu.com/s?ie=utf-8&wd=test&rn=5&pn=10",
    "user_agent_type": "desktop"
}
```

## **Yandex**

Job parameter assignment to URL:

{% code overflow="wrap" %}

```
https://yandex.<domain>/search/?text=<query>&numdoc=<limit>&p=<start_page>&lr=<geo_location>
```

{% endcode %}

When forming URLs, please follow these instructions:

1. **Encoding search terms**: Search terms must be URL encoded. For instance, spaces should be replaced with `%20`, which represents a space character in a URL.
2. **Start page adjustment**: The value of the `start_page` has to be reduced by 1. For example, if the desired starting page is 3, then the value in the URL, which represents the page number, has to be `2`.
3. **Localization**: If the domain is either `ru` or `tr`, an additional query parameter `lr` is added with the `geo_location` value. For other domains, the `geo_location` value is under the query parameter `rstr`, where a `-` symbol is added before the value.
4. **Unsupported**: pages parameter is no longer supported. Jobs have to be submitted separately by changing the current page value in the URL.

#### Built URL examples

```
https://yandex.ru/search/?text=test&numdoc=5&p=0&lr=100
```

```
https://yandex.com/search/?text=test%201&numdoc=10&p=2&rstr=-100
```

#### Equivalent job example

Decommissioned `yandex_search` source:

```json
{
    "source": "yandex_search",
    "query": "test",
    "domain": "com",
    "limit": 5,
    "start_page": 3,
    "geo_location": 100,
    "results_language": "en"
}
```

Updated `universal` source:

```json
{
    "source": "universal",
    "url": "https://yandex.ru/search?text=adidas&numdoc=5&p=2&lr=100&lang=en"
}
```

## Wayfair

Job parameter assignment to URL:

{% code overflow="wrap" %}

```
https://www.wayfair.<domain>/keyword.php?keyword=<query>&itemsperpage=<limit>&curpage=<start_page>
```

{% endcode %}

When forming URLs, please follow these instructions:

1. **Encoding search terms**: search terms must be URL encoded. For instance, spaces should be replaced with `%20`, which represents a space character in a URL.
2. **Parameters**: If `limit` is equal to `48` and `start_page` is equal to `1`, then following additional parameters have to be appended to URL:
   1. `command=dosearch`
   2. `new_keyword_search=true`

#### Built URL examples

```
https://www.wayfair.com/keyword.php?keyword=test&itemsperpage=24&curpage=1
```

{% code overflow="wrap" %}

```
https://www.wayfair.fr/keyword.php?keyword=t%202&itemsperpage=48&curpage=1&command=dosearch&new_keyword_search=true
```

{% endcode %}

#### Equivalent job example

Decommissioned `wayfair_search` source:

```json
{
   "source": "wayfair_search",
   "query": "test",
   "domain": "com",
   "limit": 5,
   "start_page": 3
}
```

Updated `universal` source:

```json
{
   "source": "universal",
   "url": "https://www.wayfair.com/keyword.php?keyword=room&itemsperpage=5&curpage=10"
}
```
