Legacy Modernization 2026-04-05 10 min read

Automated Business Rule Extraction from COBOL Using AST Analysis

How to automatically extract business rules, data flows, and program dependencies from COBOL source code using Abstract Syntax Tree analysis and tree-sitter queries.

By AITYTECH Engineering

One of the biggest challenges in legacy modernization is understanding what the code actually does. A 10,000-line COBOL program may contain hundreds of business rules — validation logic, calculation formulas, conditional workflows — buried in deeply nested IF-ELSE structures and PERFORM chains. Extracting these rules manually takes weeks. AST analysis can do it in seconds.

What Makes COBOL Business Rule Extraction Hard

COBOL business logic is notoriously difficult to extract because:

AST-Based Approach

Using tree-sitter, we can parse COBOL into a structured AST and then run queries to extract specific patterns. Here are the key extraction techniques:

1. Variable Discovery

Tree-sitter query to find all data items with their PIC clauses:

(data_description
  (level_number) @level
  (entry_name) @name
  (picture_clause (pic_string) @pic)?
) @item

This gives us every variable definition with its level number, name, and data type — the foundation for understanding data flow.

2. Conditional Logic Mapping

Business rules typically live inside IF statements and EVALUATE (COBOL's CASE/SWITCH):

(if_statement
  (condition) @condition
) @rule

(evaluate_statement
  (evaluate_subject) @subject
  (evaluate_when
    (evaluate_object) @when_value
  ) @branch
) @switch

3. Calculation Rules

Financial calculations are often the most critical business rules:

(compute_statement) @calc
(add_statement) @calc
(subtract_statement) @calc
(multiply_statement) @calc
(divide_statement) @calc

4. External Dependencies

CALL statements and COPY members reveal program dependencies:

(call_statement) @external_call
(copy_statement) @copy_include

From AST to Business Rules

The raw AST gives us structure. The next step is semantic analysis:

  1. Data flow tracing — follow MOVE statements to track how values propagate through variables
  2. Condition grouping — cluster related IF/EVALUATE blocks that reference the same variables
  3. Cross-reference — link PERFORM targets to paragraph definitions to understand call chains
  4. Rule annotation — match patterns to known business rule templates (validation, calculation, routing)

Our parser service provides the AST foundation. The analysis layer can be built on top using the query API to extract exactly the patterns relevant to your modernization project.

Real-World Impact

In a recent analysis of a banking COBOL system (42 programs, ~15,000 lines each), AST-based extraction identified:

What would have taken a team of 4 analysts approximately 3 months was completed in under 2 hours of automated analysis plus 2 days of human review.