Atiendia Logo Atiendia
πŸ’‘ Discovery Module

Idea Generation
with Atiendia Research

More than a search engine: an engine that reasons. We analyze your document corpus to find non-obvious connections that a human would take years to detect.

How does this module work?

This is the second step of the Atiendia Research pipeline. First, our system ingests and "understands" thousands of your documents (papers, technical manuals, reports). Then, the Idea Generation engine crosses that massive information to propose novel hypotheses.

Beyond human reading

The ability to "connect the dots" between disparate fragments of knowledge (claims) within the corpus is what distinguishes Atiendia Research from a simple search tool.

The system emulates the intuition of an expert researcher but operating at a massive scale that would be impossible for a human. It analyzes thousands of documents simultaneously, identifying synergies, gaps, and opportunities that would remain hidden in a manual review.

3 Types of Generated Ideas

The system automatically identifies and categorizes three main types of creative relationships

πŸ”„

Methodological Synergies

SYNERGIZES_WITH

Detects when two distinct methodologies or techniques, possibly from different fields, have the potential to combine to produce a superior result.

πŸ’‘

Example:

"The Bayesian optimization method described in Paper A could significantly improve the efficiency of the neural search algorithm proposed in Paper B, reducing convergence time from weeks to days."

Use cases:

  • β†’ Combine preprocessing techniques from different domains
  • β†’ Merge complementary evaluation approaches
  • β†’ Integrate qualitative and quantitative methodologies
🎯

Potential Applications

POTENTIAL_APPLICATION

Identifies opportunities for "cross-pollination", suggesting that a technique, tool, or theory validated in one context (origin) can be successfully applied to an unsolved problem in another context (destination).

πŸ’‘

Example:

"The racial bias evaluation framework in facial recognition systems from Study X is directly applicable to measure gender bias in the new natural language dataset presented in Study Y, which currently lacks a robust metric."

Use cases:

  • β†’ Transfer solutions between adjacent domains (e.g., CV β†’ NLP)
  • β†’ Apply theoretical frameworks to new empirical contexts
  • β†’ Reuse datasets and benchmarks for new tasks
πŸš€

Follow-up Inspiration

INSPIRES_FOLLOWUP

Suggests the logical next step in research. Based on the limitations or conclusions of a study, it proposes future experiments or concrete extensions that expand existing knowledge.

πŸ’‘

Example:

"Given the success of the reinforcement learning model in Experiment Z in controlled simulated environments, a natural follow-up study would be to validate it in 'in-the-wild' environments with real user data, as suggested in Paper W which identified the gap between lab and production."

Use cases:

  • β†’ Identify explicit limitations requiring extensions
  • β†’ Propose validations in new contexts (languages, domains, scales)
  • β†’ Suggest ablation studies of unexplored components

πŸ“Š Real Output Example

Want to see what the real output looks like? Download an automatically generated sample report.

πŸ“š Sample corpus context:

  • β€’ Domains: RAG, AI, Biomed (Clinical NLP, Genomics/Drug ML), FinText (Reports/Calls, SEC Filings)
  • β€’ ~400+ papers processed (15% of total corpus) β†’ ~2,200 Claims extracted
  • β€’ 865 Claims processed in generation phase β†’ 4,139 idea edges generated
  • β€’ This demo: ~25% of those edges, only those categorized as EMERGING_LINK after novelty_check

πŸ”’ How to interpret the scores:

novelty_score (0.0 – 1.0)

How novel the proposed connection is. High values (>0.7) indicate links underexplored in existing literature.

knownness_score (0.0 – 1.0)

How much prior evidence exists for this relationship. Low values (<0.5) suggest less-traveled territory.

confidence (0.0 – 1.0)

Model's confidence in the validity of the proposed connection, based on evidence quality found.

EMERGING_LINK

Verdict: high novelty + low knownness = potentially valuable connection worth human exploration.

⚠️ Important note: This is a demo. The novelty "judge" is an LLM, so false positives (EMERGING that's actually KNOWN) or false negatives may exist. Each idea includes verifiable evidence to facilitate final human review.

Format: plain text with technical structure. Hash IDs (doc_id, claim_id) are internal references; DOI/arXiv links appear in the "evidence" section.

Discovery Engines

Advanced technology under the hood

πŸ”

Vector Semantic Search

Finds deep conceptual relationships beyond keyword matching.

  • β€’ High-dimensional embeddings (768-1536D)
  • β€’ Cosine similarity to measure conceptual closeness
  • β€’ Models specialized in scientific literature (SciBERT, etc.)
πŸ•ΈοΈ

Knowledge Graphs

Uses citation structure and entity relationships to understand the scientific "neighborhood".

  • β€’ Author-paper, paper-citation, paper-concept relationships
  • β€’ PageRank and centrality to identify key findings
  • β€’ Community detection to find thematic clusters
πŸ”—

Literature Based Discovery (LBD)

Identifies indirect connections that no paper has explicitly explored.

  • β€’ If A implies B, and B implies C β†’ possible A-C relationship
  • β€’ ABC model (Swanson): discover unknown bridges
  • β€’ Filters trivial connections using negative co-occurrence
🎯

Diversification (MMR)

Ensures presented ideas are diverse and cover different angles.

  • β€’ Maximal Marginal Relevance: balance relevance vs novelty
  • β€’ Avoids redundancy in similar suggestions
  • β€’ Adjustable Ξ» parameter: exploration vs exploitation

Claim Classification

Each finding is automatically categorized for precise semantic filtering

βš™οΈ

METHOD

claim_kind

Description of a technique, algorithm, architecture or experimental procedure.

E.g.: "We used a 12-layer transformer encoder with multi-head attention"

πŸ“Š

RESULT

claim_kind

Empirical finding, performance metric or conclusion derived from data.

E.g.: "We achieved 94.2% accuracy on the MNIST dataset"

⚠️

LIMITATION

claim_kind

Known restriction, edge case or weakness of the proposed method.

E.g.: "The model fails with low-resolution images (<64px)"

πŸ“–

DEFINITION

claim_kind

Formalization of a concept, theory or domain-specific terminology.

E.g.: "We define 'adversarial robustness' as the ability to..."

πŸ“š

BACKGROUND

claim_kind

Context, related work or state of the art prior to the study.

E.g.: "Previous studies like [Smith et al., 2019] demonstrated that..."

πŸ’Ύ

DATASET

claim_kind

Mention of datasets used or created in the research.

E.g.: "We created a new dataset of 10K annotated medical images"

πŸ”–

OTHER Category

Relevant information that doesn't fit into the above categories

Extracted Semantic Relationships

The system connects claims with auxiliary entities to enrich context

πŸ”§

[:USES_METHOD]

Connects a Claim with a Method entity.

Example:

Claim: "We achieved SOTA on ImageNet" [:USES_METHOD] β†’ Method: "Pre-trained ResNet-152"

πŸ“ˆ

[:SUPPORTED_BY_EVIDENCE]

Connects a Claim with an Evidence entity (tables, figures, experiments).

Example:

Claim: "The model is robust to noise" [:SUPPORTED_BY_EVIDENCE] β†’ Evidence: "Table 3, Figure 5"

🚧

[:LIMITED_BY]

Connects a Claim with a Limitation entity.

Example:

Claim: "Our classifier is accurate" [:LIMITED_BY] β†’ Limitation: "Only works with English"

❓

[:RAISES_QUESTION]

Connects a Claim with an OpenQuestion entity (explicit future work).

Example:

Claim: "We observed improvements in accuracy" [:RAISES_QUESTION] β†’ Question: "Does it work in other languages?"

🎯

[:HAS_SCOPE]

Defines the scope or applicability of the claim (specific context of validity).

Example:

Claim: "Effective technique" [:HAS_SCOPE] β†’ Scope: "on high-resolution medical images"

🏷️

Context Metadata

Each Claim is enriched with normalized contextual metadata to improve later retrieval:

context_task

Specific task (e.g., "Image Classification", "NER")

context_dataset

Dataset operated (e.g., "ImageNet", "CoNLL-2003")

context_metric

Metric used (e.g., "F1-Score", "Accuracy")

context_model_family

Base architecture (e.g., "Transformer", "CNN")

System Workflow

How ideas are generated step by step

1

Claim Extraction

The system processes each document and extracts key scientific claims: main findings, methodologies used, conclusions, stated limitations.

2

Hybrid Indexing

Claims are indexed using semantic vectors (for similarity search) and knowledge graphs (for structural navigation).

3

Candidate Search

For each source claim, the system retrieves candidate claims that could have interesting relationships using vector search, graph navigation, and LBD.

4

LLM Reasoning

An advanced language model (GPT-4/Claude) analyzes each claim pair and determines if there's a creative relationship (synergy, application, follow-up), assigning a confidence score.

5

Diversification and Ranking

Ideas are ranked by confidence score and diversified using MMR to present a non-redundant set of high-impact suggestions.

Ready to Discover Hidden Connections?

Let Atiendia Research analyze your corpus and generate research ideas you would never have found manually.