Preview data

This page explains how to preview data in the Cloud Data Fusion Studio so that you can debug errors before you deploy and run a pipeline.

If you encounter errors, you can fix them while your pipeline is still in Draft mode.

Cloud Data Fusion uses the first 100 rows of your source dataset to generate the preview.

In Preview mode, the Studio page displays the status and duration of the preview job. You can stop the preview job at any time. You can also monitor the log events as the preview job runs.

Console

  1. Check that each source, transformation, and sink has no errors. To validate them, on the Cloud Data Fusion Studio page, go to the node for each plugin and click Properties > Validate.
  2. View your pipeline on the Studio page and click the Preview toggle to the on position. Run, Duration, and Logs options appear at the top of the Studio page canvas.
  3. Optional: before you run the preview job, update the following settings by clicking Configure.

    1. Runtime arguments: for more information see Set up runtime arguments.
    2. Preview config: update the number of rows to preview.
    3. Advanced options: update the pipeline and engine configurations. For more information, see Manage pipeline configurations.
  4. To start the preview job, click Run. When the preview job runs, no data is written to the sink, but you can check that data is read and written as expected when you deploy the pipeline.

  5. Optional: after you run the preview job, to see what your data looks like at each stage in the pipeline, click Preview data on nodes in the pipeline that handle data, such as sources, sinks, and transformations.

  6. When you're finished previewing the data, exit Preview mode by clicking the Preview toggle to the off position.

What's next