Data Management

Upload, organize, and manage research datasets within your CATO projects. Support for 40+ file formats and hierarchical folders.

Supported File Formats

CATO accepts 40+ file formats across research domains. All formats are processable by pre-installed analysis libraries in the sandbox.

Tabular Data

CSV / TSV - Comma or tab-separated values
Excel - Both .xlsx and legacy .xls
Parquet / Feather - Columnar formats for large datasets
JSON - Structured data
SAS / Stata - Statistical software datasets (.sas7bdat, .dta)

Genomics & Sequences

FASTA / FASTQ - DNA, RNA, and protein sequences
VCF - Variant call format
GFF / GTF - Gene feature annotations
BED - Genomic intervals
GenBank - Annotated sequences (.gb, .gbk)
GMT - Gene set definitions (for GSEA)
Newick - Phylogenetic trees (.nwk)

Single-Cell & Flow Cytometry

H5AD - AnnData format for scRNA-seq (scanpy standard)
HDF5 - Hierarchical Data Format (.h5, .hdf5)
FCS - Flow Cytometry Standard (v2.0, 3.0, 3.1)

Molecular Structures

PDB / CIF - Protein and macromolecular structures
SDF / MOL / MOL2 - Small molecule structures
SMILES - Chemical string notation (.smi)

Medical Imaging

NIfTI - Neuroimaging volumes (.nii, .nii.gz)
DICOM - Clinical imaging standard (.dcm)

Proteomics & Mass Spectrometry

mzML - Open standard for mass spec data
MGF - Mascot generic format

Documents, Networks & Other

PDF - Text and table extraction
MATLAB - .mat data files
NumPy - .npy, .npz array files
GML / GraphML - Network and graph data
Text - TXT, Markdown, SQL, code scripts
Images - PNG, JPG, GIF, SVG, WebP

Batch Upload

ZIP / TAR Archives - Upload multiple files at once (.zip, .tar, .tar.gz, .tar.bz2)
Max decompressed size: 2 GB per archive
Automatically extracts and imports each supported file

File Size Limits: Vary by subscription tier (25 MB Free, 100 MB Base, 500 MB Pro, 1 GB Pay-as-you-go, Unlimited Enterprise)

Uploading Data

Navigate to the Database tab in your project
Click Upload Data or drag and drop files directly into the upload zone
CATO will automatically detect column types and preview the first rows
Review the preview and click Confirm Upload

Tip: You can upload ZIP/TAR archives containing multiple CSV, Excel, or other supported files. CATO will extract and import each file as a separate dataset.

Note: Column data types (numeric, categorical, datetime) are inferred automatically. Metadata including row count, column count, and data types are extracted and stored for quick reference.

Organizing with Folders

Create hierarchical folders to organize datasets within your project. Folders support unlimited nesting depth.

Creating Folders

In the Database tab, click New Folder
Enter a folder name (e.g., "Raw Data", "Processed", "Results")
Select parent folder (optional, for nested organization)
Click Create

Moving Datasets

Drag and drop datasets between folders, or use the dataset menu to select a destination folder.

Deleting Folders

When you delete a folder, its contents (datasets and subfolders) are moved to the parent folder, not deleted.

Dataset Metadata

CATO automatically extracts and stores metadata for each uploaded dataset:

Structure

Number of rows, columns, and data types

Column Inventory

List of all column names and their types

File Info

Original filename, upload date, file size

Source Tracking

Upload method (manual or archive extraction)

You can add custom descriptions to datasets for easier identification when using @-mentions in chat.

Subscription Limits

Dataset limits vary by subscription tier:

Plan	Datasets per Project	Max File Size
Free	5	25 MB
Base	25	100 MB
Pro	100	500 MB
Pay-as-you-go	Unlimited	1 GB
Enterprise	Unlimited	Unlimited

Best Practices

1.
Use descriptive filenames
Include study phase, date, or cohort identifiers
2.
Organize with folders
Group related datasets by analysis type or data source
3.
Add custom descriptions
Makes @-mentions easier when referencing datasets in chat
4.
Prefer Parquet for large datasets
Faster loading and smaller file sizes than CSV

Next Steps

CATO Chat →Learn how to analyze your datasets with AI
Best Practices →Recommended workflows for research data management