Troubleshooting managed feature storage - Azure Machine Learning (2023)

  • Article

This article shows you how to resolve common issues you may encounter with managed feature storage in Azure Machine Learning.

Problems were found when building and updating the feature store

When you create or update a feature store, you may encounter the following issues.

  • ARM throttling error
  • RBAC permission error
  • Duplicate materialization identity ARM ID issue

ARM throttling error

Symptoms

Creating or updating the feature store fails. The error may look like this:

{ "error": { "code": "TooManyRequests", "message": "The request is slowing down because the write operation type limit has been reached. ...", "details": [ { "code": "TooManyRequests", " target": "Microsoft.MachineLearningServices/workspaces", "message": "..." } ] }}

The solution

Perform the function store create/update operation later. Since deployment is a multi-step process, the second attempt might fail because some of the resources already exist. Delete these resources and continue working.

RBAC permission error

To create a feature store, a user must have the followingassociateIUser access administratorRoles (or a custom role that covers the same or a larger set of actions).

Symptoms

If the user does not have the required roles, the assignment will fail. The error response may look like this

{ "error": { "code": "AuthorizationFailed", "message": "Client "{client_id}" with object ID "{object_id}" does not have permission to use action "{action_name}" in Execute range "{ ". range}' or the range is invalid. If access was recently granted, please update your credentials." }}

The solution

approveassociateIUser access administratorAssign roles to the user in the resource group where the feature store will be created and instruct the user to restart the deployment.

For more details seePermissions required for the Feature Store Materialization managed identity role..

Issue with duplicate materialization identity, ARM ID

After the feature store is updated to enable materialization for the first time, some later updates to the feature store may have this error.

Symptoms

When updating the feature store using the SDK/CLI, the update fails with the following error message

Error:

{ "error":{ "code": "InvalidRequestContent", "message": "The request content contains duplicate JSON property names that lead to ambiguity in the paths 'identity.userAssignedIdentities['/subscriptions/{sub-id}/resourceGroups/ { rg }/providers/Microsoft.ManagedIdentity/userAssignedIdentities/{your-uai}']' Update the request content to remove duplicates and try again. }}

The solution

The problem is in the ARM ID formatmaterialization_identity.

In the Azure UI or SDK, the ARM ID of a user-assigned managed identity is written in lowercase lettersresource groups. See the following example:

  • (A): /subscriptions/{sub-id}/resource groups/{rg}/providers/Microsoft.ManagedIdentity/userAssignedIdentities/{vaš-uai}

When a feature store uses a user-assigned managed identity as materialization_identity, its ARM ID is normalized and stored with itresource groups, see the following example:

  • (B): /subscriptions/{sub-id}/resource groups/{rg}/providers/Microsoft.ManagedIdentity/userAssignedIdentities/{vaš-uai}

The next time the user updates the feature store and uses the same user-assigned managed identity as the materialization identity in the update request, while also using the (A) ARM ID format, the update fails with the above error.

(Video) Feature Store (Feast) in Azure (Machine Learning)

To fix the problem, replace the stringresource groupssresource groupsEnter the user-assigned managed identity ARM ID and restart the feature store update.

Failed to create feature set specification

  • Invalid schema in feature set specification
  • The transformation class could not be found
  • FileNotFoundError im Codeordner

Invalid schema in feature set specification

Before a feature set is registered in the feature repository, users first define and run the feature set specification locally.to_spark_dataframe()to confirm it.

Symptoms

When the user is working.to_spark_dataframe()Various schema validation errors can occur when the feature set data frame schema does not match what is defined in the feature set specification.

For example:

  • Error message:azure.ai.ml.Exceptions.ValidationException: Schema validation error, timestamp column: timestamp not in output data frame
  • Error message:Exception: Schema validation error, no index column: AccountID in output data frame
  • Error message:ValidationException: schema validation failed, feature column: transaction_7d_count has data type: ColumnType.long, expected: ColumnType.string

The solution

Check the schema validation error and update the feature set specification definition for the column name and type accordingly. For example:

  • Please updateizvor.timestamp_column.nameProperty to correctly define the timestamp column name.
  • Please updateindex_columnsProperty to correctly define index columns.
  • Please updatecharacteristicsproperty to correctly define feature column names and types.

Then run.to_spark_dataframe()again to check if the validation passed.

If the feature set specification is defined using the SDK, it is recommended to use it as wellinfer_shemaAbility to autofill the SDKcharacteristics, instead of typing it in manually. Thetimestamp columnIindex columnscannot be filled in automatically.

Check it outSchema of the feature set specificationdoc for more details.

The transformation class could not be found

Symptoms

When the user is working.to_spark_dataframe(), the following error is returnedAttributeError: Module '<...>' attribute name '<...>'

For example:

  • AttributeError: Modul „7780d27aa8364270b6b61fed2a43b749.transaction_transform“ s atributom „TransactionFeatureTransformer1“

The solution

The feature transformation class is expected to be defined in a Python file at the root of the code folder (the code folder may contain other files or subfolders).

Set the valuefeature_transformation_code. transformation_classbe property.,

For example, if the code folder looks like this

Encode/
└── my_transformation_class.py

iMyFeatureTransformerThe class is defined in the file my_transformation_class.py.

Sentencefeature_transformation_code. transformation_classbemy_transformation_class.MyFeatureTransformer

FileNotFoundError im Codeordner

Symptoms

This can happen when the YAML feature set specification is created manually rather than generated by the SDK. If the userleads.to_spark_dataframe() aus, the following error is returnedFileNotFoundError: [Errno 2] No such file or directory: ....

(Video) Azure Machine Learning Studio Tutorial

The solution

Check the code map. It is expected to be a subfolder in the feature set specification folder.

Then you specify in the feature set specificationfeature_transformation_code.pathmust be a relative path to the feature set specification map. For example:

Feature set specification map/
├── Code/
│ ├── my_transformer.py
│ └── my_other_folder
└── FeatureSetSpec.yaml

And in this example thatfeature_transformation_code.paththe property should be in YAML./Encode

Note

When creating a FeatureSetSpec Python object using the create_feature_set_spec function in theazureml-featurestore, It may take some timefeature_transformation_code.paththis is any local map. When a FeatureSetSpec object is placed in the target folder to form a feature set specification in yaml format, the code path is copied to the target folder andfeature_transformation_code.pathUpdated property in YAML specification.

Function call job and query error

  • Retrieval specification for error correction features
  • The feature_retrieval_spec.yaml file was not found when using a model as input to a feature retrieval job
  • [Observational data are not associated with characteristic values]
  • The user or managed identity does not have the appropriate RBAC permission to store features
  • The user or managed identity does not have the appropriate RBAC permission to read from native storage or offline storage
  • The training job cannot read data generated by the built-in feature fetcher

If the feature fetch job fails, check the error details by going toStart the details page, chooseOutputs + LogsPress the tab and check the filelogs/azureml/driver/stdout.

When the user is workingget_offline_feature()When you query the notebook, the error is displayed directly as cell output.

Retrieval specification for error correction features

Symptoms

The query/job to get the functions shows the following errors

  • Invalid function
Code: 'UserError' Message: 'Function'' was not found in this function set."
  • Invalid feature store URI:
Poruka: "Resurs 'Microsoft.MachineLearningServices/workspaces/' under resource group '<>>resource group name>'->' not found. For more information see https://aka.ms/ARMResourceNotFoundFix",code: "ResourceNotFound"
  • Invalid feature set:
Code: "UserError" Message: "A set of functions named:and versions:not found."

The solution

Check the contentfeature_retrieval_spec.yamluse work. Ensure that all feature store URIs, feature set/version names, and feature names are valid and exist in the feature store.

It is also recommended to use the utility function to select features from the feature store and generate a feature retrieval specification YAML file.

This code snippet usesgenerate_specification_retrieving_featuresUtility function.

from azureml.featurestore import FeatureStoreClientfrom azure.ai.ml.identity import AzureMLOnBehalfOfCredentialfeaturestore = FeatureStoreClient(credential = AzureMLOnBehalfOfCredential(),subscription_id = featurestore_subscription_id,resource_group_name = featurestore_resource_group_name,name = featurestor e_name)transactions_featureset = featurestore.feature_sets.get(name="transactions", verzija = "1")features = [ communications_featureset.get_feature('transaction_amount_7d_sum'), communications_featureset.get_feature('transaction_amount_3d_sum')]feature_retrieval_spec_folder = "./project/fraud_model/feature_retrieval_spec"featurestore.generate_feature_ retrieval_spec(feature_retrieval_spec_folder, Funktionen)

filefeature_retrieval_spec.yamlNot found when using a model as input to a feature fetch job

Symptoms

If a registered model is used as input to a feature fetch job, the job fails with the following error message:

ValueError: Visit Error: Execution Error: Streaming error from input data sourceVisitError(ExecutionError(StreamError(NotFound)))=> Execution Error: Streaming error from input data sourceExecutionError(StreamError(NotFound)); Could not find path: azureml://subscriptions/{sub_id}/resourcegroups/{rg}/workspaces/{ws}/datastores/workspaceblobstore/paths/LocalUpload/{guid}/feature_retrieval_spec.yaml

Solution:

When you provide a model as input to a feature fetch step, the fetch specification YAML file is assumed to exist in the model's artifacts folder. The job fails if the file does not exist.

To solve the problem, pack thisfeature_retrieval_spec.yamlin the root folder of the model artifact folder before you register the model.

Observational data are not associated with characteristic values

Symptoms

After users run a query/job to retrieve features, the output data will not contain feature values.

(Video) Scalable & Managed Batch Prediction with Azure Machine Learning

For example, a user runs a feature fetch job to fetch featurestransaction_amount_3d_avgItransaction betrag_7d_avg

Transaction-IDKonto-IDtimestampis_fraudtransaction_amount_3d_avgtransaction betrag_7d_avg
83870774-7A98-43B...A105552044461895028.02.2023 04:34:270NullNull
25144265-F68B-4FD...A105552044461895028.02.2023 10:44:300NullNull
8899ED8C-B295-43F...A105552044481238006.03.2023 00:36:300NullNull

The solution

Feature retrieval runs a point-in-time join query. If the merge result is empty, try the following possible solutions:

  • expandtemporal_join_lookbackSet the scope in the feature set specification definition or temporarily remove it. This allows the point-in-time association to look further (or infinitely) past the timestamp of the observation event to find feature values.
  • Isource.source_delayMake sure this is also set in the feature set specification definitiontemporal_join_lookback > source.source_delay.

If none of the above solutions work, grab the feature set from the feature store and run it.to_spark_dataframe()to manually check the feature index and timestamp columns. The error can have the following causes:

  • The index values ​​in the observation data do not exist in the feature set data frame
  • There is no feature value that is in the past timestamp of the observation event.

In such cases, if the feature has offline materialization enabled, you may need to fill in additional feature data.

The user or managed identity does not have the appropriate RBAC permission to store features

Symptoms:

The function call job/query fails with the following error messagelogs/azureml/driver/stdout:

Traceback (last call last): file "/home/trusted-service-user/cluster-env/env/lib/python3.8/site-packages/azure/ai/ml/_restclient/v2022_12_01_preview/operations/_workspaces_operations.py " , line 633, in get raise HttpResponseError(response=response, model=error, error_format=ARMErrorFormat)azure.core.Exceptions.HttpResponseError: (AuthorizationFailed) Client "XXXX" with object ID "XXXX" is not authorized to run action " Microsoft.MachineLearningServices/workspaces/read" via scope "/subscriptions/XXXX/resourceGroups/XXXX/providers/Microsoft.MachineLearningServices/workspaces/XXXX" or the scope is invalid If access was recently granted, please update your credentials. Code: Authorization failure

Solution:

  1. If the job uses a managed identity to retrieve functions, assign itAzureML Data ScientistAdd a role to the identity in the feature store.
  2. If this happens when a user runs code in an Azure Machine Learning Spark notebook that uses their own identity to access the Azure Machine Learning service, assign the followingAzureML Data ScientistAssociate the feature store role with the user's Azure Active Directory identity.

AzureML Data Scientistis a recommended role. The following actions allow the user to create their own custom role

  • Microsoft.MachineLearningServices/workspaces/datastores/listsecrets/action
  • Microsoft.MachineLearningServices/workspaces/featuresets/read
  • Microsoft.MachineLearningServices/workspaces/read

For more information on setting up RBAC, see the document.

The user or managed identity does not have the appropriate RBAC permission to read from native storage or offline storage

Symptoms

The function call job/query fails with the following error message in logs/azureml/driver/stdout:

An error occurred while calling o1025.parquet: java.nio.file.AccessDeniedException: Operation failed: "This request is not authorized to perform this operation with this permission.", 403, GET, https://{storage}. dfs. core.windows.net/test?upn=false&resource=filesystem&maxResults=5000&directory=datasources&timeout=90&recursive=false, AuthorizationPermissionMismatch, "This request does not have permission to perform this operation with this permission. RequestId:63013315-e01f-005e-577b-7c63b8000000 Time: 2023-05-01T22:20:51.1064935Z"at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.checkException(AzureBlobFileS ystem.java:1203)at org.ap ahh . hadoop.fs.azurebfs.AzureBlobFileSystem.listStatus(AzureBlobFileSystem .java:408)at org.apache.hadoop.fs.Globber.listStatus(Globber.java:128)at org.apache.hadoop.fs.Globber.doGlobber. java:291)at org.apache.hadoop.fs.Globber.glob(Globber.java:202)at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:2124)

Solution:

  • If the job uses a managed identity to retrieve functions, assign itBlob memory data readerRole in native storage and offline storage for identity.
  • If this happens when a user runs a query in an Azure Machine Learning Spark notebook, it uses its own identity to access and provision the Azure Machine Learning serviceBlob memory data readerRole in native storage and offline storage for user identity.

Blob memory data readeris the recommended minimum requirement for access. The user can also assign roles such as additional permissionsContributor to blob data storageorOwner of blob memory data.

Check it outManage access control for the managed feature repositoryFor more information on setting up RBAC, see the document.

The training job cannot read data generated by the built-in feature fetcher

Symptoms

The training job fails with an error message

  • Training information is not available.
FileNotFoundError: [Errno 2] No such file or directory
  • The format is incorrect.
parser error:

The solution

The built-in feature fetcher has one output:output data. The output data is the uri_folder data asset. It always has the following folder structure:

/
├── Data/
│ ├── xxxxx.parkett
│ └── xxxxx.parkett
└── feature_retrieval_spec.yaml

And the output data is always in parquet format.

Update the training script to read from the data subfolder and read the data as a parquet.

Feature materialization jobs failed

  • Invalid offline store configuration
  • The materialization identity does not have the appropriate RBAC permission to store features
  • The materialization identity does not have the appropriate RBAC permission to read from storage
  • The materialization identity does not have the appropriate RBAC permission to write data to offline storage

If a feature materialization job fails, the user can follow these steps to view the job failure details.

  1. Go to the feature store page:https://ml.azure.com/featureStore/{your-feature-store-name}.
  2. To gorange of functionsOn the tab, select the feature set you're working on and navigate to itFeature set details page.
  3. On the feature set details page, selectmaterialization affairsThen select the failed job to view the job details.
  4. In the view of order details underCharacteristicsThe order status and error message are displayed on the map.
  5. In addition, you can go toOutputs + LogsTab, then you will findstdoutfile fromlogs\azureml\driver\stdout

After the fix is ​​applied, the user can manually run a backfill materialization job to verify that the fix is ​​working.

(Video) Managed (Automated) Apache Spark Compute in Azure Machine Learning

Invalid offline store configuration

Symptoms

The materialization job fails with the following error message in logs/azureml/driver/stdout:

Error message:

Causes: Status Code: -1 Error Code: null Error Message: InvalidAbfsRestOperationExceptionjava.net.UnknownHostException: adlgen23.dfs.core.windows.net
java.util.concurrent.ExecutionException: Operation failed: "The specified resource name contains invalid characters.", 400, HEAD, https://{storage}.dfs.core.windows.net/{container-name}/{fs - id}/transactions/1/_delta_log?upn=false&action=getStatus&timeout=90

The solution

Validate the offline store target defined in the feature store using the SDK.

iz azure.ai.ml import MLClientfrom azure.ai.ml.identity import AzureMLOnBehalfOfCredentialfs_client = MLClient(AzureMLOnBehalfOfCredential(), Featurestore_subscription_id, Featurestore_resource_group_name, Featurestore_name)featurestore = fs_client.feature_stores.get(name=fe aturestore_name)featurestore.offline_store.target

User can also check it on feature store UI preview page.

Verify that the target is in the following format and that both storage and container are present.

/subscriptions/{sub-id}/resourceGroups/{rg}/providers/Microsoft.Storage/storageAccounts/{storage}/blobServices/default/containers/{The name of the container}

The materialization identity does not have the appropriate RBAC permission to store features

Symptoms:

The materialization job fails with the following error messagelogs/azureml/driver/stdout:

Traceback (last call last): file "/home/trusted-service-user/cluster-env/env/lib/python3.8/site-packages/azure/ai/ml/_restclient/v2022_12_01_preview/operations/_workspaces_operations.py " , line 633, in get raise HttpResponseError(response=response, model=error, error_format=ARMErrorFormat)azure.core.Exceptions.HttpResponseError: (AuthorizationFailed) Client "XXXX" with object ID "XXXX" is not authorized to run action " Microsoft.MachineLearningServices/workspaces/read" via scope "/subscriptions/XXXX/resourceGroups/XXXX/providers/Microsoft.MachineLearningServices/workspaces/XXXX" or the scope is invalid If access was recently granted, please update your credentials. Code: Authorization failure

Solution:

AssignAzureML Data ScientistFeature store role for the materialization identity (user-assigned managed identity) of the feature store.

AzureML Data Scientistis a recommended role. You can use the following actions to create your own custom role

  • Microsoft.MachineLearningServices/workspaces/datastores/listsecrets/action
  • Microsoft.MachineLearningServices/workspaces/featuresets/read
  • Microsoft.MachineLearningServices/workspaces/read

For more information seePermissions required for the Feature Store Materialization managed identity role..

The materialization identity does not have the appropriate RBAC permission to read from storage

Symptoms

The materialization job fails with the following error message in logs/azureml/driver/stdout:

An error occurred while calling o1025.parquet: java.nio.file.AccessDeniedException: Operation failed: "This request is not authorized to perform this operation with this permission.", 403, GET, https://{storage}. dfs. core.windows.net/test?upn=false&resource=filesystem&maxResults=5000&directory=datasources&timeout=90&recursive=false, AuthorizationPermissionMismatch, "This request does not have permission to perform this operation with this permission. RequestId:63013315-e01f-005e-577b-7c63b8000000 Time: 2023-05-01T22:20:51.1064935Z"at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.checkException(AzureBlobFileS ystem.java:1203)at org.ap ahh . hadoop.fs.azurebfs.AzureBlobFileSystem.listStatus(AzureBlobFileSystem .java:408)at org.apache.hadoop.fs.Globber.listStatus(Globber.java:128)at org.apache.hadoop.fs.Globber.doGlobber. java:291)at org.apache.hadoop.fs.Globber.glob(Globber.java:202)at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:2124)

Solution:

AssignBlob memory data readerThe role in the source store transferred to the feature store materialization identity (user-assigned managed identity).

Blob memory data readeris the recommended minimum requirement for access. You can also assign roles with more permissions, e.gContributor to blob data storageorOwner of blob memory data.

For more information on configuring RBAC, seePermissions required for the Feature Store Materialization managed identity role..

The materialization identity does not have the appropriate RBAC permission to write data to offline storage

Symptoms

The materialization job fails with the following error messagelogs/azureml/driver/stdout:

When calling o1162.load. an error occurred: java.util.concurrent.ExecutionException: java.nio.file.AccessDeniedException: Operation failed: "This request is not authorized to perform this operation with this permission.", 403, HEAD, https://featuresotrestorage1. dfs .core.windows.net/offlinestore/fs_xxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx_fsname/transactions/1/_delta_log?upn=false&action=getStatus&timeout=90at com.google.common.util.concurrent.AbstractFuture$ Sync.getValue(AbstractFuture); .java:306)at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:293)at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)at com .google.common.util.concurrent.Uninterruptibles.getUninterruptibles(Uninterruptibles.java:135) at com.google.common.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2410) at com.google.common.cache.localCache$Segment. LocalCache $Segment.loadSync(LocalCache.java:2380) at com.google.common.cache.LocalCache$S

The solution

AssignContributor to blob data storageThe role of storing offline commerce has been migrated to the materialization identity of the commerce (user-managed identity).

Contributor to blob data storageis the recommended minimum requirement for access. You can also assign roles like other privilegesOwner of blob memory data.

(Video) Azure Machine Learning Datasets

For more information on configuring RBAC, seePermissions required for the Feature Store Materialization managed identity role...

Videos

1. Build and Maintain your Company Copilot with Azure ML and GPT-4 Q&A | DIS213H
(Microsoft Developer)
2. Document Intelligence using Azure Feature Store (Feathr) and SynapseML | OD129
(Microsoft Developer)
3. Managing your ML lifecycle with Azure Databricks and Azure Machine Learning | OD210
(Microsoft Ignite)
4. What is Azure Machine Learning service and how data scientists use it
(Microsoft Mechanics)
5. Forecasting Demand with Azure Machine Learning
(3Cloud)
6. Production ML Pipelines with Python SDK v2 of Azure Machine Learning
(MG)
Top Articles
Latest Posts
Article information

Author: Jamar Nader

Last Updated: 05/06/2023

Views: 5255

Rating: 4.4 / 5 (55 voted)

Reviews: 94% of readers found this page helpful

Author information

Name: Jamar Nader

Birthday: 1995-02-28

Address: Apt. 536 6162 Reichel Greens, Port Zackaryside, CT 22682-9804

Phone: +9958384818317

Job: IT Representative

Hobby: Scrapbooking, Hiking, Hunting, Kite flying, Blacksmithing, Video gaming, Foraging

Introduction: My name is Jamar Nader, I am a fine, shiny, colorful, bright, nice, perfect, curious person who loves writing and wants to share my knowledge and understanding with you.