Data Management
Upload, organize, and manage research datasets within your CATO projects. Support for 40+ file formats and hierarchical folders.
Supported File Formats
CATO accepts 40+ file formats across research domains. All formats are processable by pre-installed analysis libraries in the sandbox.
Tabular Data
- CSV / TSV - Comma or tab-separated values
- Excel - Both .xlsx and legacy .xls
- Parquet / Feather - Columnar formats for large datasets
- JSON - Structured data
- SAS / Stata - Statistical software datasets (.sas7bdat, .dta)
Genomics & Sequences
- FASTA / FASTQ - DNA, RNA, and protein sequences
- VCF - Variant call format
- GFF / GTF - Gene feature annotations
- BED - Genomic intervals
- GenBank - Annotated sequences (.gb, .gbk)
- GMT - Gene set definitions (for GSEA)
- Newick - Phylogenetic trees (.nwk)
Single-Cell & Flow Cytometry
- H5AD - AnnData format for scRNA-seq (scanpy standard)
- HDF5 - Hierarchical Data Format (.h5, .hdf5)
- FCS - Flow Cytometry Standard (v2.0, 3.0, 3.1)
Molecular Structures
- PDB / CIF - Protein and macromolecular structures
- SDF / MOL / MOL2 - Small molecule structures
- SMILES - Chemical string notation (.smi)
Medical Imaging
- NIfTI - Neuroimaging volumes (.nii, .nii.gz)
- DICOM - Clinical imaging standard (.dcm)
Proteomics & Mass Spectrometry
- mzML - Open standard for mass spec data
- MGF - Mascot generic format
Documents, Networks & Other
- PDF - Text and table extraction
- MATLAB - .mat data files
- NumPy - .npy, .npz array files
- GML / GraphML - Network and graph data
- Text - TXT, Markdown, SQL, code scripts
- Images - PNG, JPG, GIF, SVG, WebP
Batch Upload
- ZIP / TAR Archives - Upload multiple files at once (.zip, .tar, .tar.gz, .tar.bz2)
- Max decompressed size: 2 GB per archive
- Automatically extracts and imports each supported file
File Size Limits: Vary by subscription tier (25 MB Free, 100 MB Base, 500 MB Pro, 1 GB Pay-as-you-go, Unlimited Enterprise)
Uploading Data
- Navigate to the Database tab in your project
- Click Upload Data or drag and drop files directly into the upload zone
- CATO will automatically detect column types and preview the first rows
- Review the preview and click Confirm Upload
Tip: You can upload ZIP/TAR archives containing multiple CSV, Excel, or other supported files. CATO will extract and import each file as a separate dataset.
Note: Column data types (numeric, categorical, datetime) are inferred automatically. Metadata including row count, column count, and data types are extracted and stored for quick reference.
Organizing with Folders
Create hierarchical folders to organize datasets within your project. Folders support unlimited nesting depth.
Creating Folders
- In the Database tab, click New Folder
- Enter a folder name (e.g., "Raw Data", "Processed", "Results")
- Select parent folder (optional, for nested organization)
- Click Create
Moving Datasets
Drag and drop datasets between folders, or use the dataset menu to select a destination folder.
Deleting Folders
When you delete a folder, its contents (datasets and subfolders) are moved to the parent folder, not deleted.
Dataset Metadata
CATO automatically extracts and stores metadata for each uploaded dataset:
Structure
Number of rows, columns, and data types
Column Inventory
List of all column names and their types
File Info
Original filename, upload date, file size
Source Tracking
Upload method (manual or archive extraction)
You can add custom descriptions to datasets for easier identification when using @-mentions in chat.
Subscription Limits
Dataset limits vary by subscription tier:
| Plan | Datasets per Project | Max File Size |
|---|---|---|
| Free | 5 | 25 MB |
| Base | 25 | 100 MB |
| Pro | 100 | 500 MB |
| Pay-as-you-go | Unlimited | 1 GB |
| Enterprise | Unlimited | Unlimited |
Best Practices
- 1.
Use descriptive filenames
Include study phase, date, or cohort identifiers
- 2.
Organize with folders
Group related datasets by analysis type or data source
- 3.
Add custom descriptions
Makes @-mentions easier when referencing datasets in chat
- 4.
Prefer Parquet for large datasets
Faster loading and smaller file sizes than CSV
Next Steps
- CATO Chat →Learn how to analyze your datasets with AI
- Best Practices →Recommended workflows for research data management