Best Practices

Recommended workflows for effective research data management and analysis in CATO.

Project Organization

One Project Per Study

Create a separate project for each distinct research study or analysis. This keeps datasets, chat history, and results organized and prevents cross-contamination of data contexts.

Descriptive Naming

Use clear, descriptive names for projects and datasets:

Avoid	Prefer
data.csv	patient_outcomes_2024_q4.csv
Project 1	NSCLC Phase IV Cohort Study
test analysis	Treatment Response Subgroup Analysis

Use Labels

Apply color-coded labels to categorize datasets by:

Status — Draft, Validated, Final
Type — Clinical, Demographics, Outcomes
Source — REDCap, Manual Entry, External

Data Preparation

Clean Before Upload

Perform basic data cleaning before importing into CATO:

Remove completely empty rows and columns
Ensure consistent date formats within columns
Standardize categorical values (e.g., "M"/"F" not "male"/"Male"/"m")
Convert special characters to UTF-8 compatible equivalents

Column Headers

Follow these conventions for column names:

Use lowercase with underscores: patient_age
Avoid spaces and special characters
Be descriptive but concise
Include units when applicable: weight_kg, duration_days

Batch Uploads

When uploading multiple related files, use a ZIP archive. CATO will extract and import each file separately while keeping them organized within your project.

Tip: CATO's chat interface can help you identify data quality issues. Try asking "Are there any missing values in @my_dataset?"

Chat Sessions

Use @-Mentions for Clarity

Always use @-mentions to reference specific datasets, especially when you have multiple datasets in a project. This removes ambiguity and ensures CATO queries the right data.

Plan Before Complex Analyses

For multi-step analyses (e.g., survival analysis, subgroup comparisons, report generation), use Plan mode (Shift+Tab). This lets you review CATO's approach before it executes, saving time on revisions.

Start with Exploration

Before diving into specific analyses, ask general questions to understand your data:

"What columns are in @my_dataset?"
"Summarize the numeric columns"
"How many unique values are in the treatment_group column?"

Be Specific

The more specific your questions, the more accurate the responses:

Vague

"Show me the data"

Specific

"Show the first 20 rows of @patient_data sorted by age descending"

Use Separate Sessions

Start a new chat session when switching to a different analysis topic. This keeps context focused and makes it easier to review past work.

Leverage Biomedical Tools

When working with biomedical data, take advantage of CATO's integrated tools. Ask about genes, pathways, drug interactions, or search PubMed for relevant literature to provide context for your analyses.

Generate Reports for Sharing

When you need to share results with colleagues, ask CATO to generate a DOCX report. Reports include formatted headings, tables, embedded figures, and page numbers — ready for review or presentation.

Customize Figures Before Exporting

Use the Figure Editor (pencil icon) to customize colors, add text annotations, adjust labels, and add significance brackets before downloading or embedding into reports. This avoids round-tripping through external tools like Illustrator. CATO will automatically use your edited figures in follow-up analyses and reports.

Review Assumptions

After each analysis, CATO states its key assumptions: why it chose a method, how it handled missing data, what grouping or filtering it applied, and any thresholds it set. Review these before proceeding to validate the approach.

Share Conversations Instead of Screenshots

Use the share button in the conversation header to generate a public read-only link. Collaborators see the full conversation including all figures, code, and file artifacts — much richer than a screenshot. Revoke links anytime when no longer needed.

Literature Intelligence Workflows

Start Broad, Then Narrow

Begin with a broad search to understand the landscape, then refine your query and switch search modes for targeted results. Use Novelty-First to survey the field, then Evidence-First for specific clinical questions.

Build Collections Incrementally

Save relevant papers to named collections as you find them rather than trying to collect everything in a single search. Create collections for different aspects of your research (e.g., "Methods", "Background", "Competing Studies").

Use Evidence Matrix for Comparisons

When comparing studies, run Evidence Matrix extraction on your selected papers to get structured PICO data. This is much more efficient than reading each abstract individually and helps identify patterns across studies.

Leverage the Chat Bridge

After curating papers in Literature Intelligence, send them to Chat for deeper analysis. The Chat AI can synthesize findings, identify conflicts, and generate comprehensive reports that combine literature with your own data.

Set Up Watch Alerts Early

For ongoing research projects, save your key search queries as Watch Alerts early in the project. This ensures you are notified when new relevant publications appear, keeping your literature review current.

Combine Network Analysis with Evidence

Use Citation Network analysis to identify the most influential papers (high PageRank), then run Evidence Matrix on those papers to understand why they are highly cited. This combination reveals both the structure and substance of a research field.

Tip: For the best citation network results, select at least 10-20 papers with DOIs. Papers without DOIs may not form citation edges in the graph.

Version Control

While CATO tracks chat history and saves all artifacts automatically, consider these practices:

Keep original data files in a separate backup location
Document preprocessing steps in your project description
Use dataset descriptions to note which version of source data was used
Download important file artifacts (reports, exports) for offline access