Storage layer management
This page provides detailed information about the Wherobots file structure in cloud storage and its associated behaviors.
File browser¶
Wherobots' file browser allows you to see the file and directory structure of your data warehouse on Wherobots Cloud.
Uploading data¶
You can upload file(s) to Wherobots Cloud for later use within your
Jupyter notebooks and jobs. You can upload individual files directly
through your browser, or with the AWS CLI and aws s3 cp
commands to
upload files directly to Wherobots Cloud's data warehouse. The latter is
recommended if you need to upload multiple files, large files, or more
complex folder structures like partitioned Parquet files.
To upload a file from the browser, navigate to the desired folder, click the "Upload" button, and select the file you want to upload.
To upload data using the AWS CLI:
- Click the "Upload" button and follow the steps to request temporary AWS ingest credentials. A short-lived set of AWS credentials (access key ID, access key secret, and session token) will be generated for you.
- Configure your local environment with those credentials in your
~/.aws/credentials
file. Refer to the AWS CLI documentation for more information on the configuration of AWS credentials for the CLI. - Navigate to the parent folder you want to upload to, and copy the full
S3 path of the target folder by clicking the "Copy" icon on its right
hand side. The path should look like
s3://wbts-wbc-XXXX/XXXX/data/customer-XXXX/
. - From your command line, upload your files with
aws s3 cp
:
$ cat ~/.aws/credentials
[wherobots]
aws_access_key_id = ...
aws_access_secret_key = ...
aws_session_token = ...
$ aws --profile=wherobots s3 cp --recursive my-data/ s3://wbts-wbc-XXXX/XXXX/data/customer-XXXX/
The data directory is accessible from within the Jupyter notebook environment via predefined environment variables:
USER_S3_PATH
- pointing to/data/customer-XXXX
USER_S3_SHARED_PATH
- pointing to/data/shared
USER_WAREHOUSE_PATH
- pointing to/data/customer-XXXX/warehouse
Folder permissions¶
The following top-level folders are read-only, indicating that it is not possible to create new folders, upload files, or delete any existing folders:
/
(root)/data/
/notebooks/
The following folders are where the user can create new folders, upload files, or delete any file within them:
/data/customer-XXXX
/data/shared
/notebooks/customer-XXXX
/notebooks/shared
/spark-logs
Note
The /spark-logs
folder allows read and write permissions, but it is recommended that users refrain from tampering with the logs, as it may affect their experience.
User specific folders¶
The folders with the name of customer-XXXX/
are unique to each user.
The files and folders within them can only be accessed by the user.
Organization-wide shared folders¶
Folders named shared/
grant access to everyone within your
organization. Any member of the organization can create new folders,
delete existing ones, and have the privileges to read, write, upload,
and delete files within this shared folder.
Copy a file or folder's S3 Path¶
If needed, you can copy the direct S3 path for your directory and files by clicking on the "copy" icon on the right hand side of each displayed folder or file.
One-way auto-syncing¶
Modifications made in the home directory of the Jupyter notebook within the Notebook instance are automatically synchronized with the file structure every 2 minutes. Additionally, synchronization occurs when you request to destroy the Notebook instance, guaranteeing that no data or progress is lost.
This synchronization operates in a one-way manner. Specifically, if you
add any notebooks or data while a Notebook instance is in the RUNNING
state, those changes will not be reflected in the Jupyter Notebook
interface. To ensure visibility in the Jupyter Notebook, it is necessary
to add files before creating a Notebook instance. This process
guarantees the complete replication of all contents from
/notebooks/customer-XXXX
to the Notebook instance, enabling seamless
recovery of the file structures.
Note
Synchronization will be performed to the directory path /notebooks/customer-XXX
.