Skip to content

Storage layer management

This page provides detailed information about the Wherobots file structure in cloud storage and its associated behaviors.

File browser

Wherobots' file browser allows you to see the file and directory structure of your data warehouse on Wherobots Cloud.

File Browser

Uploading data

You can upload file(s) to Wherobots Cloud for later use within your Jupyter notebooks and jobs. You can upload individual files directly through your browser, or with the AWS CLI and aws s3 cp commands to upload files directly to Wherobots Cloud's data warehouse. The latter is recommended if you need to upload multiple files, large files, or more complex folder structures like partitioned Parquet files.

To upload a file from the browser, navigate to the desired folder, click the "Upload" button, and select the file you want to upload.

To upload data using the AWS CLI:

  • Click the "Upload" button and follow the steps to request temporary AWS ingest credentials. A short-lived set of AWS credentials (access key ID, access key secret, and session token) will be generated for you.
  • Configure your local environment with those credentials in your ~/.aws/credentials file. Refer to the AWS CLI documentation for more information on the configuration of AWS credentials for the CLI.
  • Navigate to the parent folder you want to upload to, and copy the full S3 path of the target folder by clicking the "Copy" icon on its right hand side. The path should look like s3://wbts-wbc-XXXX/XXXX/data/customer-XXXX/.
  • From your command line, upload your files with aws s3 cp:
$ cat ~/.aws/credentials
[wherobots]
aws_access_key_id = ...
aws_access_secret_key = ...
aws_session_token = ...
$ aws --profile=wherobots s3 cp --recursive my-data/ s3://wbts-wbc-XXXX/XXXX/data/customer-XXXX/

The data directory is accessible from within the Jupyter notebook environment via predefined environment variables:

  • USER_S3_PATH - pointing to /data/customer-XXXX
  • USER_S3_SHARED_PATH - pointing to /data/shared
  • USER_WAREHOUSE_PATH - pointing to /data/customer-XXXX/warehouse

Folder permissions

The following top-level folders are read-only, indicating that it is not possible to create new folders, upload files, or delete any existing folders:

  • / (root)
  • /data/
  • /notebooks/

The following folders are where the user can create new folders, upload files, or delete any file within them:

  • /data/customer-XXXX
  • /data/shared
  • /notebooks/customer-XXXX
  • /notebooks/shared
  • /spark-logs

Note

The /spark-logs folder allows read and write permissions, but it is recommended that users refrain from tampering with the logs, as it may affect their experience.

User specific folders

The folders with the name of customer-XXXX/ are unique to each user. The files and folders within them can only be accessed by the user.

Organization-wide shared folders

Folders named shared/ grant access to everyone within your organization. Any member of the organization can create new folders, delete existing ones, and have the privileges to read, write, upload, and delete files within this shared folder.

Copy a file or folder's S3 Path

If needed, you can copy the direct S3 path for your directory and files by clicking on the "copy" icon on the right hand side of each displayed folder or file.

One-way auto-syncing

Modifications made in the home directory of the Jupyter notebook within the Notebook instance are automatically synchronized with the file structure every 2 minutes. Additionally, synchronization occurs when you request to destroy the Notebook instance, guaranteeing that no data or progress is lost.

This synchronization operates in a one-way manner. Specifically, if you add any notebooks or data while a Notebook instance is in the RUNNING state, those changes will not be reflected in the Jupyter Notebook interface. To ensure visibility in the Jupyter Notebook, it is necessary to add files before creating a Notebook instance. This process guarantees the complete replication of all contents from /notebooks/customer-XXXX to the Notebook instance, enabling seamless recovery of the file structures.

Note

Synchronization will be performed to the directory path /notebooks/customer-XXX.


Last update: January 22, 2024 20:33:05