CATOCATO

Data Management

Upload, organize, and manage research datasets within your CATO projects. Support for 40+ file formats and hierarchical folders.


Supported File Formats

CATO accepts 40+ file formats across research domains. All formats are processable by pre-installed analysis libraries in the sandbox.

Tabular Data

  • CSV / TSV - Comma or tab-separated values
  • Excel - Both .xlsx and legacy .xls
  • Parquet / Feather - Columnar formats for large datasets
  • JSON - Structured data
  • SAS / Stata - Statistical software datasets (.sas7bdat, .dta)

Genomics & Sequences

  • FASTA / FASTQ - DNA, RNA, and protein sequences
  • VCF - Variant call format
  • GFF / GTF - Gene feature annotations
  • BED - Genomic intervals
  • GenBank - Annotated sequences (.gb, .gbk)
  • GMT - Gene set definitions (for GSEA)
  • Newick - Phylogenetic trees (.nwk)

Single-Cell & Flow Cytometry

  • H5AD - AnnData format for scRNA-seq (scanpy standard)
  • HDF5 - Hierarchical Data Format (.h5, .hdf5)
  • FCS - Flow Cytometry Standard (v2.0, 3.0, 3.1)

Molecular Structures

  • PDB / CIF - Protein and macromolecular structures
  • SDF / MOL / MOL2 - Small molecule structures
  • SMILES - Chemical string notation (.smi)

Medical Imaging

  • NIfTI - Neuroimaging volumes (.nii, .nii.gz)
  • DICOM - Clinical imaging standard (.dcm)

Proteomics & Mass Spectrometry

  • mzML - Open standard for mass spec data
  • MGF - Mascot generic format

Documents, Networks & Other

  • PDF - Text and table extraction
  • MATLAB - .mat data files
  • NumPy - .npy, .npz array files
  • GML / GraphML - Network and graph data
  • Text - TXT, Markdown, SQL, code scripts
  • Images - PNG, JPG, GIF, SVG, WebP

Batch Upload

  • ZIP / TAR Archives - Upload multiple files at once (.zip, .tar, .tar.gz, .tar.bz2)
  • Max decompressed size: 2 GB per archive
  • Automatically extracts and imports each supported file

File Size Limits: Vary by subscription tier (25 MB Free, 100 MB Base, 500 MB Pro, 1 GB Pay-as-you-go, Unlimited Enterprise)

Uploading Data

  1. Navigate to the Database tab in your project
  2. Click Upload Data or drag and drop files directly into the upload zone
  3. CATO will automatically detect column types and preview the first rows
  4. Review the preview and click Confirm Upload

Tip: You can upload ZIP/TAR archives containing multiple CSV, Excel, or other supported files. CATO will extract and import each file as a separate dataset.

Note: Column data types (numeric, categorical, datetime) are inferred automatically. Metadata including row count, column count, and data types are extracted and stored for quick reference.

Organizing with Folders

Create hierarchical folders to organize datasets within your project. Folders support unlimited nesting depth.

Creating Folders

  1. In the Database tab, click New Folder
  2. Enter a folder name (e.g., "Raw Data", "Processed", "Results")
  3. Select parent folder (optional, for nested organization)
  4. Click Create

Moving Datasets

Drag and drop datasets between folders, or use the dataset menu to select a destination folder.

Deleting Folders

When you delete a folder, its contents (datasets and subfolders) are moved to the parent folder, not deleted.

Dataset Metadata

CATO automatically extracts and stores metadata for each uploaded dataset:

Structure

Number of rows, columns, and data types

Column Inventory

List of all column names and their types

File Info

Original filename, upload date, file size

Source Tracking

Upload method (manual or archive extraction)

You can add custom descriptions to datasets for easier identification when using @-mentions in chat.

Subscription Limits

Dataset limits vary by subscription tier:

PlanDatasets per ProjectMax File Size
Free525 MB
Base25100 MB
Pro100500 MB
Pay-as-you-goUnlimited1 GB
EnterpriseUnlimitedUnlimited

Best Practices

  • 1.

    Use descriptive filenames

    Include study phase, date, or cohort identifiers

  • 2.

    Organize with folders

    Group related datasets by analysis type or data source

  • 3.

    Add custom descriptions

    Makes @-mentions easier when referencing datasets in chat

  • 4.

    Prefer Parquet for large datasets

    Faster loading and smaller file sizes than CSV

Next Steps