---
title: Load dataset from Hugging Face | Tiger Data Docs
description: The ai.loaddataset function allows you to load datasets from Hugging Face's datasets library directly into your PostgreSQL database.
---

# Load dataset from Hugging Face

The `ai.load_dataset` function allows you to load datasets from Hugging Face’s datasets library directly into your PostgreSQL database.

## Example usage

```
select ai.load_dataset('squad');


select * from squad limit 10;
```

## Parameters

| Name              | Type  | Default  | Required | Description                                                                                                                                                                        |
| ----------------- | ----- | -------- | -------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| name              | text  | -        | ✔        | The name of the dataset on Hugging Face (e.g., ‘squad’, ‘glue’, etc.)                                                                                                              |
| config\_name      | text  | -        | ✖        | The specific configuration of the dataset to load. See [Hugging Face documentation](https://huggingface.co/docs/datasets/v2.20.0/en/load_hub#configurations) for more information. |
| split             | text  | -        | ✖        | The split of the dataset to load (e.g., ‘train’, ‘test’, ‘validation’). Defaults to all splits.                                                                                    |
| schema\_name      | text  | ’public’ | ✖        | The PostgreSQL schema where the table will be created                                                                                                                              |
| table\_name       | text  | -        | ✖        | The name of the table to create. If null, will use the dataset name                                                                                                                |
| if\_table\_exists | text  | ’error’  | ✖        | Behavior when table exists: ‘error’ (raise error), ‘append’ (add rows), ‘drop’ (drop table and recreate)                                                                           |
| field\_types      | jsonb | -        | ✖        | Custom PostgreSQL data types for columns as a JSONB dictionary from name to type.                                                                                                  |
| batch\_size       | int   | 5000     | ✖        | Number of rows to insert in each batch                                                                                                                                             |
| max\_batches      | int   | null     | ✖        | Maximum number of batches to load. Null means load all                                                                                                                             |
| kwargs            | jsonb | -        | ✖        | Additional arguments passed to the Hugging Face dataset loading function                                                                                                           |

## Returns

Returns the number of rows loaded into the database (bigint).

## Using multiple transactions

The `ai.load_dataset` function loads all data in a single transaction. However, to load large dataset, it is sometimes useful to use multiple transactions. For this purpose, we provide the `ai.load_dataset_multi_txn` procedure. That procedure is similar to `ai.load_dataset`, but it allows you to specify the number of batches between commits using the `commit_every_n_batches` parameter.

```
CALL ai.load_dataset_multi_txn('squad', commit_every_n_batches => 10);
```

## Examples

1. Basic usage - Load the entire ‘squad’ dataset:

```
SELECT ai.load_dataset('squad');
```

The data is loaded into a table named `squad`.

2. Load a small subset of the ‘squad’ dataset:

```
SELECT ai.load_dataset('squad', batch_size => 100, max_batches => 1);
```

3. Load the entire ‘squad’ dataset using multiple transactions:

```
CALL ai.load_dataset_multi_txn('squad', commit_every_n_batches => 100);
```

4. Load specific configuration and split:

```
SELECT ai.load_dataset(
    name => 'glue',
    config_name => 'mrpc',
    split => 'train'
);
```

5. Load with custom table name and field types:

```
SELECT ai.load_dataset(
    name => 'glue',
    config_name => 'mrpc',
    table_name => 'mrpc',
    field_types => '{"sentence1": "text", "sentence2": "text"}'::jsonb
);
```

6. Pre-create the table and load data into it:

```
CREATE TABLE squad (
    id          TEXT,
    title       TEXT,
    context     TEXT,
    question    TEXT,
    answers     JSONB
);


SELECT ai.load_dataset(
    name => 'squad',
    table_name => 'squad',
    if_table_exists => 'append'
);
```

## Notes

- The function requires an active internet connection to download datasets from Hugging Face.
- Large datasets may take significant time to load depending on size and connection speed.
- The function automatically maps Hugging Face dataset types to appropriate PostgreSQL data types unless overridden by `field_types`.
