# Datasets

Datasets are a **representation of a table, view or data entity** physically stored in some data source accessible in Biuwer through the corresponding Data Connection.

The easiest way to have Datasets is to use **Reverse Engineering** on those connections that allow it, because it is an almost automatic, fast and efficient process.

However, **you can create and manage datasets manually** in Biuwer, for which you must know the detail of the corresponding table, view or data entity in the data source.

Also when uploading data from files in CSV or Excel format, datasets are automatically created or updated. Remember that in this case, the generated datasets are called "Managed" because it is Biuwer who manages the data.

Therefore, **there are two types of Datasets:**

* **Managed**. Biuwer manages both metadata and data and stores it for you in a **CDW (Cloud Data Warehouse)** specific to your organization. Managed datasets are available when you upload external files in CSV or Excel format, when you have connections to external applications accessible through an API and for those cases that you define to be used with the Biuwer data preparation module.
* **Not Managed**: Biuwer only has the metadata to be able to perform queries and it is your organization that physically manages the data and is responsible for its update and maintenance. This is the most common case when you work with SQL or NoSQL databases that you manage in your company, for example, those used by ERP (Enterprise Resource Planning), CRM (Customer Relationship Management), Ecommerce, etc.

### **List of Datasets**

In the Data Center, the list of datasets that your Organization has defined so far in Biuwer is available in the menu "Datasets":

![List of Datasets in Biuwer Data Center](https://1704772157-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-M3wQp48Ng4g1XyuoV8l%2F-MMpbWKzXecQbUaEGsGN%2F-MMpunXGNRe04CjgK6PV%2Fbiuwer-centro-datos-conjuntos-datos.png?alt=media\&token=d51f3cd6-91bb-4708-a443-b5b45ba4feaa)

From this list you can perform the following operations:

* **Filter** datasets by **Name, Alias, Connection** and whether they are **Managed** or **Not Managed**.
* **Create a new dataset**, using the top right "**Add**" button.
* From the **context menu** of each dataset, **View the detail, Edit** the dataset or **Delete** the dataset.

### **Manual creation**

Use the "Add" button available in the dataset list to manually create a dataset.

A dataset creation dialog appears in which you must first choose which type of dataset to create, Managed or Not Managed. Depending on the choice, the parameters that are necessary in each case are activated.

![Creating a Managed Dataset](https://1704772157-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-M3wQp48Ng4g1XyuoV8l%2F-MOlKAOphXbzoUjnUpuP%2F-MOlN5-BHqzvvUD79bie%2Fbiuwer-centro-datos-conjunto-datos-nuevo-gestionado.png?alt=media\&token=52399fff-3be8-44f2-a53c-16c91434bafb)

Creating a **new managed dataset** implies that the Organization will have available a table with the physical name corresponding to the value of the attribute "Name" of the dataset, in the DWH (Data Warehouse) managed by Biuwer. Obviously this table will be empty and you can insert data into it by uploading data from CSV or Excel files, or by using the Biuwer data preparation module.

![Creating a Not Managed Dataset](https://1704772157-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-M3wQp48Ng4g1XyuoV8l%2F-MOlNWkG5iO0j6rSDNFw%2F-MOlNop49E0Jf1QEl8X8%2Fbiuwer-centro-datos-conjunto-datos-nuevo-no-gestionado.png?alt=media\&token=6b696c4b-2619-4ae8-9014-96dc40a961c0)

Creating a **new not managed dataset** implies that the Organization will be able to perform data queries on a table, view or data entity with the physical name corresponding to the value of the attribute "Name" of the dataset, in the SQL or NoSQL database system associated with the connection used. If such a table, view or data entity does not exist, the queries launched against any of its data fields will obviously give an error. It is the responsibility of the Organization to ensure that the data entity exists and is prepared with the expected data in order to be able to analyze it in Biuwer.

### **Detail of the dataset**

When you access a Dataset within the Data Center, its full detail is displayed, with access to the data **fields**, a data preview with the first 100 **records**, and the configuration of the dataset's **data policies**.

Besides being able to see all following details, **you can edit the dataset and even delete it**, if it has no active dependencies, that is, if it is not being used in any Data Model and therefore is not being used in any Card.

![Detail of the fields of a Dataset](https://1704772157-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-M3wQp48Ng4g1XyuoV8l%2F-MMpbWKzXecQbUaEGsGN%2F-MMq1eGglV-y36L3Mn-R%2Fbiuwer-centro-datos-conjunto-datos-campos.png?alt=media\&token=1757e60d-8ca5-455e-8201-60d1a42c4f02)

From this list you can see and manage existing fields in the dataset. From the list you can perform following operations:

* **Filter** fields, by any of their attributes.
* **Edit** fields.
* **Add** a field, which can be either **Standard** or **Calculated**.
* Launch the **reverse engineering** associated specifically with the dataset being displayed, in order to add or modify fields that have been modified at the source.
* **Delete** fields.

Fields of Datasets **marked as hidden** will not be shown to users when composing data cards, although it may be of interest to manage them for data validation, for example, internal identifiers.

{% hint style="info" %}
A **Standard field** is a field that physically exists in the data entity, while a **Calculated field** does not physically exist as such in the data entity, but is defined as an expression or formula that can use other fields of the dataset for its calculation.
{% endhint %}

![Preview of the first 100 records of a dataset](https://1704772157-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-M3wQp48Ng4g1XyuoV8l%2F-MMpbWKzXecQbUaEGsGN%2F-MMq2rft8z2HXgvY3mK7%2Fbiuwer-centro-datos-conjunto-datos-registros.png?alt=media\&token=c18836ca-fa5d-4f99-9e18-089eed0b4edb)

With the preview the user can get an idea of the type of information available in the dataset, before modelling the information and moving on to the assembly of reports, charts, etc.

![Data policies of a dataset](https://1704772157-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-M3wQp48Ng4g1XyuoV8l%2F-MMpbWKzXecQbUaEGsGN%2F-MMq34uIAZLayTm73vQb%2Fbiuwer-centro-datos-conjunto-datos-politicas.png?alt=media\&token=5880c13e-c244-48bb-a663-cdff87bab56d)

**Data policies** are a very versatile tool for dynamically displaying different data contained within the same dataset to different users or groups of users. This is of great value as it allows minimizing the number of pages and cards designed in Biuwer because often the same chart, table, map or KPI is used to display the appropriate information to different usage profiles. For example, a Sales Dashboard can be designed and implemented in Biuwer to display:

* Full details to the company management team or company direction.
* Data filtered by sales areas to each area sales manager.
* Data filtered by clients to account executives, according to the clients managed by each one.

**Data policies** are explained in detail in the corresponding section.

### **Managing data fields**

When we already have datasets in Biuwer, we can manage their data fields when necessary, including new ones, editing existing ones or deleting existing ones.

When a particular field needs to be modified, we can edit it using the following dialog where we can modify:

* The **physical name** of the field in the data entity.
* The **alias of the field in Biuwer**. This alias is intended to be a business name, without including characters present in the physical name, such as "\_", "-", and will be the one presented to the user in the final result of cards.
* The **description of the field in Biuwer**. It does not appear in the end user interface, but it’s useful to explain the meaning of the field, how it was obtained, how it was calculated, aspects to take into account for analysis, etc.
* The **data type** of the field: Text, Number, Date or Boolean.
* The **type of field**: Dimension or Metric.
* **The default aggregation function** for metric type fields, which depends on the selected data type.&#x20;
* If the field is **hidden** to the user.
* Whether the field is **calculated**.

![Editing a field in a dataset](https://1704772157-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-M3wQp48Ng4g1XyuoV8l%2F-MMtUgpFXgNWgEYk6NGX%2F-MMufgWpdqhM_4MQWrFL%2Fbiuwer-centro-datos-conjunto-datos-edicion-campo.png?alt=media\&token=622e7e76-8b06-41db-a4d2-10f7112d445c)

Sometimes it is necessary or recommended to create **calculated fields** in a dataset. In this case you do not point to a physical field of the data entity, but define a logical expression that includes a formula suitable for the source data engine, which can include:

* The physical **data fields** of the dataset
* Basic arithmetic **operators** (+, -, \*, /) and when necessary parentheses, brackets and curly braces.
* **Functions** available in the function catalog.

![Creation of a calculated field in a dataset](https://1704772157-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-M3wQp48Ng4g1XyuoV8l%2F-MMuftIHbUvbHuM6P9Rt%2F-MMugGJM-OPctM-vACJu%2Fbiuwer-centro-datos-conjunto-datos-nuevo-campo-calculado.png?alt=media\&token=9a54c304-019c-47ed-9027-491d3ea933e6)

There’s also the possibility to launch Reverse Engineering specifically on that dataset. within the Fields tab.
