Given the volume of meetings, interviews, and customer interactions in modern business environments, audio recordings play a crucial role in capturing valuable information. Manually transcribing and summarizing these recordings can be a time-consuming and tedious task. Fortunately, advancements in generative AI and automatic speech recognition (ASR) have paved the way for automated solutions that can streamline this process.
Customer service representatives receive a high volume of calls each day. Previously, calls were recorded and manually reviewed later for compliance, regulations, and company policies. Call recordings had to be transcribed, summarized, and then redacted for personal identifiable information (PII) before analyzing calls, resulting in delayed access to insights.
Redacting PII is a critical practice in security for several reasons. Maintaining the privacy and protection of individuals’ personal information is not only a matter of ethical responsibility, but also a legal requirement. In this post, we show you how to use Amazon Transcribe to get near real-time transcriptions of calls sent to Amazon Bedrock for summarization and sensitive data redaction. We’ll walk through an architecture that uses AWS Step Functions to orchestrate the process, providing seamless integration and efficient processing
Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading model providers such as AI21 Labs, Anthropic, Cohere, Meta, Stability AI, Mistral AI, and Amazon through a single API, along with a broad set of capabilities you need to build generative AI applications with security, privacy, and responsible AI. You can use Amazon Bedrock Guardrails to redact sensitive information such as PII found in the generated call transcription summaries. Clean, summarized transcripts are then sent to analysts. This provides quicker access to call trends while protecting customer privacy.
Solution overview
The architecture of this solution is designed to be scalable, efficient, and compliant with privacy regulations. It includes the following key components:
Recording – An audio file, such as a meeting or support call, to be transcribed and summarized
Step Functions workflow – Coordinates the transcription and summarization process
Amazon Transcribe – Converts audio recordings into text
Amazon Bedrock – Summarizes the transcription and removes PII
Amazon SNS – Delivers the summary to the designated recipient
Recipient – Receives the summarized, PII-redacted transcript
The following diagram shows the architecture overflow –
The workflow orchestrated by Step Functions is as follows:
An audio recording is provided as an input to the Step Functions workflow. This could be done manually or automatically depending on the specific use case and integration requirements.
The workflow invokes Amazon Transcribe, which converts the multi-speaker audio recording into a textual, speaker-partition transcription. Amazon Transcribe uses advanced speech recognition algorithms and machine learning (ML) models to accurately partition speakers and transcribe the audio, handling various accents, background noise, and other challenges.
The transcription output from Amazon Transcribe is then passed to Anthropic’s Claude 3 Haiku model on Amazon Bedrock through AWS Lambda. This model was chosen because it has relatively lower latency and cost than other models. The model first summarizes the transcript according to its summary instructions, and then the summarized output (the model response) is evaluated by Amazon Bedrock Guardrails to redact PII. To learn how it blocks harmful content, refer to How Amazon Bedrock Guardrails works. The instructions and transcript are both passed to the model as context.
The output from Amazon Bedrock is stored in Amazon Simple Storage Service (Amazon S3) and sent to the designated recipient using Amazon Simple Notification Service (Amazon SNS). Amazon SNS supports various delivery channels, including email, SMS, and mobile push notifications, making sure that the summary reaches the intended recipient in a timely and reliable manner
The recipient can then review the concise summary, quickly grasping the key points and insights from the original audio recording. Additionally, sensitive information has been redacted, maintaining privacy and compliance with relevant regulations.
The following diagram shows the Step Functions workflow –
Prerequisites
Follow these steps before starting:
Amazon Bedrock users need to request access to models before they’re available for use. This is a one-time action. For this solution, you need to enable access to Anthropic’s Claude 3 Haiku model on Amazon Bedrock. For more information, refer to Access Amazon Bedrock foundation models. Deployment, as described below, is currently supported only in the US West (Oregon) us-west-2 AWS Region. Users may explore other models if desired. You might need some customizations to deploy to alternative Regions with different model availability (such as us-east-1, which hosts Anthropic’s Claude 3.5 Sonnet). Make sure you consider model quality, speed, and cost tradeoffs before choosing a model.
Create a guardrail for PII redaction. Configure filters to block or mask sensitive information. This option can be found on the Amazon Bedrock console on the Add sensitive information filters page when creating a guardrail. To learn how to configure filters for other use cases, refer to Remove PII from conversations by using sensitive information filters.
Deploy solution resources
To deploy the solution, download an AWS CloudFormation template to automatically provision the necessary resources in your AWS account. The template sets up the following components:
A Step Functions workflow
Lambda functions
An SNS topic
An S3 bucket
AWS Key Management Service (AWS KMS) keys for data encryption and decryption
By using this template, you can quickly deploy the sample solution with minimal manual configuration. The template requires the following parameters:
Email address used to send summary – The summary will be sent to this address. You must acknowledge the initial Amazon SNS confirmation email before receiving additional notifications.
Summary instructions – These are the instructions given to the Amazon Bedrock model to generate the summary
Guardrail ID – This is the ID of your recently created guardrail, which can be found on the Amazon Bedrock Guardrails console in Guardrail overview
The Summary instructions are read into your Lambda function as an environment variable.
Deploy the solution
After you deploy the resources using AWS CloudFormation, complete these steps:
Add a Lambda layer.
Although AWS Lambda regularly updates the version of AWS Boto3 included, at the time of writing this post, it still provides version 1.34.126. To use Amazon Bedrock Guardrails, you need version 1.34.90 or higher, for which we’ll add a Lambda layer that updates the Boto3. You can follow the official developer guide on how to add a Lambda layer.
There are different ways to create a Lambda layer. A simple method is to use the steps outlined in Packaging the layer content, which references a sample application repo. You should be able to replace requests==2.31.0 within requirements.txt content to boto3, which will install the latest available version, then create the layer.
To add the layer to Lambda, make sure that the parameters specified in Creating the layer match the deployed Lambda. That is, you need to update compatible-architectures to x86_64.
Acknowledge the Amazon SNS email confirmation that you should receive a few moments after creating the CloudFormation stack
On the AWS CloudFormation console, find the stack you just created
On the stack’s Outputs tab, look for the value associated with AssetBucketName. It will look something like summary-generator-assetbucket-xxxxxxxxxxxxx.
On the Amazon S3 console, find your S3 assets bucket.
This is where you’ll upload your recordings. Valid file formats are MP3, MP4, WAV, FLAC, AMR, OGG, and WebM.
Upload your recording to the recordings folder in Amazon S3
Uploading recordings will automatically trigger the AWS Step Functions state machine. For this example, we use a sample team meeting recording from the sample recording.
On the AWS Step Functions console, find the summary-generator state machine. Choose the name of the state machine run with the status Running.
Here, you can watch the progress of the state machine as it processes the recording. After it reaches its Success state, you should receive an emailed summary of the recording. Alternatively, you can navigate to the S3 assets bucket and view the transcript there in the transcripts folder.
Expand the solution
Now that you have a working solution, here are some potential ideas to customize the solution for your specific use cases:
Try altering the process to fit your available source content and desired outputs:
For situations where transcripts are available, create an alternate AWS Step Functions workflow to ingest existing text-based or PDF-based transcriptions
Instead of using Amazon SNS to notify recipients through email, you can use it to send the output to a different endpoint, such as a team collaboration site or to the team’s chat channel
Try changing the summary instructions for the AWS CloudFormation stack parameter provided to Amazon Bedrock to produce outputs specific to your use case. The following are some examples:
When summarizing a company’s earnings call, you could have the model focus on potential promising opportunities, areas of concern, and things that you should continue to monitor
If you’re using the model to summarize a course lecture, it could identify upcoming assignments, summarize key concepts, list facts, and filter out small talk from the recording
For the same recording, create different summaries for different audiences:
Engineers’ summaries focus on design decisions, technical challenges, and upcoming deliverables
Project managers’ summaries focus on timelines, costs, deliverables, and action items
Project sponsors get a brief update on project status and escalations
For longer recordings, try generating summaries for different levels of interest and time commitment. For example, create a single sentence, single paragraph, single page, or in-depth summary. In addition to the prompt, you might want to adjust the max_tokens_to_sample parameter to accommodate different content lengths.
Clean up
Clean up the resources you created for this solution to avoid incurring costs. You can use an AWS SDK, the AWS Command Line Interface (AWS CLI), or the console.
Delete Amazon Bedrock Guardrails and the Lambda layer you created
Delete the CloudFormation stack
To use the console, follow these steps:
On the Amazon Bedrock console, in the navigation menu, select Guardrails. Choose your guardrail, then select Delete.
On the AWS Lambda console, in the navigation menu, select Layers. Choose your layer, then select Delete.
On the AWS CloudFormation console, in the navigation menu, select Stacks. Choose the stack you created, then select Delete.
Deleting the stack won’t delete the associated S3 bucket. If you no longer require the recordings or transcripts, you can delete the bucket separately. Amazon Transcribe is designed to automatically delete transcription jobs after 90 days. However, you can opt to manually delete these jobs before the 90-day retention period expires.
Conclusion
As businesses turn to data as a foundation for decision-making, having the ability to efficiently extract insights from audio recordings is invaluable. By using the power of generative AI with Amazon Bedrock and Amazon Transcribe, your organization can create concise summaries of audio recordings while maintaining privacy and compliance. The proposed architecture demonstrates how AWS services can be orchestrated using AWS Step Functions to streamline and automate complex workflows, enabling organizations to focus on their core business activities.
This solution not only saves time and effort, but also makes sure that sensitive information is redacted, mitigating potential risks and promoting compliance with data protection regulations. As organizations continue to generate and process large volumes of audio data, solutions like this will become increasingly important for gaining insights, making informed decisions, and maintaining a competitive edge.
About the authors
Yash Yamsanwar is a Machine Learning Architect at Amazon Web Services (AWS). He is responsible for designing high-performance, scalable machine learning infrastructure that optimizes the full lifecycle of machine learning models, from training to deployment. Yash collaborates closely with ML research teams to push the boundaries of what is possible with LLMs and other cutting-edge machine learning technologies.
Sawyer Hirt is a Solutions Architect at AWS, specializing in AI/ML and cloud architectures, with a passion for helping businesses leverage cutting-edge technologies to overcome complex challenges. His expertise lies in designing and optimizing ML workflows, enhancing system performance, and making advanced AI solutions more accessible and cost-effective, with a particular focus on Generative AI. Outside of work, Sawyer enjoys traveling, spending time with family, and staying current with the latest developments in cloud computing and artificial intelligence.