Working with Data

In this chapter you will learn:

  • How to process JSON, YAML, TOML, CSV, and XML data
  • How to convert between formats on the command line
  • Patterns for querying and transforming structured data
  • How to combine multiple data sources

Format Conversion

The simplest use of eucalypt is converting between data formats. By default, output is YAML:

# JSON to YAML
eu data.json

# YAML to JSON
eu data.yaml -j

# JSON to TOML
eu data.json -x toml

Processing JSON

Pipe JSON from other tools into eucalypt:

curl -s https://api.example.com/users | eu -e 'map(.name)'

Or process a JSON file:

eu -e 'users filter(.active) map(.email)' data.json

Processing YAML

YAML files are read natively. All YAML features including anchors, aliases, and merge keys are supported:

# config.yaml
defaults: &defaults
  timeout: 30
  retries: 3

production:
  <<: *defaults
  debug: false
eu config.yaml -e 'production'
timeout: 30
retries: 3
debug: false

YAML Timestamps

YAML timestamps are automatically converted to date-time values:

created: 2024-03-15
updated: 2024-03-15T14:30:00Z

Quote the value to keep it as a string: created: "2024-03-15".

Processing TOML

eu config.toml -e 'database.port'
5432

Processing CSV

CSV files are imported as a list of blocks, where each row becomes a block with column headers as keys:

eu -e 'rows filter(_.age num > 30)' rows=people.csv

CSV values are always strings. Use num to convert to numbers when needed.

Processing XML

XML is imported as a nested list structure. Each element is represented as [tag, attributes, ...children]:

eu -e 'root' root=xml@data.xml

Use list functions to navigate the structure:

{ import: "root=xml@data.xml" }

# Get the tag name (first element)
tag: root first

# Get attributes (second element)
attrs: root second

# Get child elements (everything after the first two)
children: root drop(2)

Named Inputs

Use named inputs to make data available under a specific name:

eu users=users.json roles=roles.json -e 'users map(.name)'

Named inputs are essential for list-based formats (CSV, JSON Lines, text):

eu lines=text@log.txt -e 'lines filter(str.matches?("ERROR")) count'

Combining Multiple Sources

A powerful pattern is combining data from multiple sources:

eu users.yaml roles.yaml merge.eu

Where merge.eu contains logic that uses names from both inputs:

# merge.eu
summary: {
  user-count: users count
  role-count: roles count
}

Using Evaluands

The -e flag specifies an expression to evaluate against the loaded inputs:

# Select a nested value
eu config.yaml -e 'database.host'

# Transform and filter
eu data.json -e 'items filter(.price > 100) map(.name)'

# Aggregate
eu data.json -e 'items map(.price) foldl(+, 0)'

Collecting Inputs

The --collect-as (-c) flag gathers multiple files into a list:

eu -c configs *.yaml -e 'configs map(.name)'

Add --name-inputs (-N) to get a block keyed by filename:

eu -c configs -N *.yaml
configs:
  a.yaml:
    name: alpha
  b.yaml:
    name: beta

Output Formats

Control the output format:

FlagFormat
(default)YAML
-jJSON
-x jsonJSON
-x tomlTOML
-x ednEDN
-x textPlain text

The format can also be inferred from the output file:

eu data.yaml -o output.json

Practical Example: Data Pipeline

Suppose you have a CSV of sales data and want to generate a JSON summary:

eu sales=sales.csv -j -e '{
  total: sales map(.amount num) foldl(+, 0)
  count: sales count
  regions: sales map(.region) unique
}'

Or as a reusable eucalypt file:

# report.eu
{ import: "sales=sales.csv" }

` :suppress
amounts: sales map(.amount num)

report: {
  total: amounts foldl(+, 0)
  count: sales count
  average: report.total / report.count
}
eu report.eu -j -e report

Key Concepts

  • Eucalypt reads JSON, YAML, TOML, CSV, XML, EDN, JSON Lines, and plain text
  • Output defaults to YAML; use -j for JSON or -x for other formats
  • Named inputs (name=file) give data a name for reference
  • The -e flag evaluates expressions against loaded data
  • --collect-as gathers multiple files into a list or block
  • CSV values are strings; use num to convert to numbers
  • Combine multiple sources with the command line input system or imports