Generative AI

Components to leverage native Generative AI capabilities on Data Warehouses.

ML Generate Text

Description

This components calls ML.GENERATE_TEXT function in BigQuery for each row on the input table.

In order to use this component, a BigQuery ML model needs to be provided by the user.

Also, having the BigQuery Connection User (roles/bigquery.connectionUser) role granted in your BigQuery project is required.

Inputs

  • Model [FQN]: The path for the model to be used in the format project_id.dataset.model

  • Prompt column: The column in the input source used to generate the prompt.

  • Max Output Tokens: an INT64 value in the range [1,1024] that sets the maximum number of tokens that the model outputs. Specify a lower value for shorter responses and a higher value for longer responses. The default is 50.

  • Temperature: a FLOAT64 value in the range [0.0,1.0] that is used for sampling during the response generation, which occurs when top_k and top_p are applied. It controls the degree of randomness in token selection. Lower temperature values are good for prompts that require a more deterministic and less open-ended or creative response, while higher temperature values can lead to more diverse or creative results. A temperature value of 0 is deterministic, meaning that the highest probability response is always selected. The default is 1.0.

  • Top P: an INT64 value in the range [1,40] that changes how the model selects tokens for output. Specify a lower value for less random responses and a higher value for more random responses. The default is 40.

  • Top K: a FLOAT64 value in the range [0.0,1.0] that changes how the model selects tokens for output. Specify a lower value for less random responses and a higher value for more random responses. The default is 1.0.

    Tokens are selected from the most (based on the top_k value) to least probable until the sum of their probabilities equals the top_p value. For example, if tokens A, B, and C have a probability of 0.3, 0.2, and 0.1 and the top_p value is 0.5, then the model selects either A or B as the next token by using the temperature value and doesn't consider C.

Outputs

  • Result table [Table]

External links

Last updated