Specifies the format and location of the input data for the dataset.

Properties

AugmentedManifests?: DatasetAugmentedManifestsListItem[]

A list of augmented manifest files that provide training data for your custom model. An augmented manifest file is a labeled dataset that is produced by Amazon SageMaker Ground Truth.

DataFormat?: DatasetDataFormat

COMPREHEND_CSV: The data format is a two-column CSV file, where the first column contains labels and the second column contains documents.

AUGMENTED_MANIFEST: The data format

DocumentClassifierInputDataConfig?: DatasetDocumentClassifierInputDataConfig

The input properties for training a document classifier model.

For more information on how the input file is formatted, see Preparing training data in the Comprehend Developer Guide.

EntityRecognizerInputDataConfig?: DatasetEntityRecognizerInputDataConfig

The input properties for training an entity recognizer model.