

## Intel<sup>®</sup> QuickAssist Technology (Intel<sup>®</sup> QAT) Software for Linux\*

Programmer's Guide – Customer Enabling Release

Revision 029

December 2023



## Legal Notices & Disclaimers

Performance varies by use, configuration and other factors. Learn more on Intel's Performance Index site.

Performance results are based on testing as of dates shown in configurations and may not reflect all publicly available updates. See backup for configuration details. No product or component can be absolutely secure.

Your costs and results may vary.

Intel technologies may require enabled hardware, software or service activation.

The products described may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.

Code names are used by Intel to identify products, technologies, or services that are in development and not publicly available. These are not "commercial" names and not intended to function as trademarks.

See Intel's Legal Notices and Disclaimers.

© Intel Corporation. Intel, the Intel logo, Atom, Xeon, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.

## Contents

| 1 | Introd | uction                |              |                                                                                      | 14 |
|---|--------|-----------------------|--------------|--------------------------------------------------------------------------------------|----|
|   | 1.1    | Termino               | ology        |                                                                                      | 14 |
|   | 1.2    | Typogra               | aphical Conv | rentions                                                                             | 17 |
| 2 | Softw  | are Overvi            | ew           |                                                                                      | 18 |
|   | 2.1    | Intel <sup>®</sup> Co | ommunicatio  | ons Chipset 8925 to 8955 Series Compatibility                                        | 18 |
|   | 2.2    | Logical I             | Instances    |                                                                                      | 18 |
|   |        | 2.2.1                 | Response     | Processing                                                                           |    |
|   |        |                       | 2.2.1.1      | Interrupt Mode                                                                       |    |
|   |        |                       | 2.2.1.2      | Epolled Mode                                                                         | 20 |
| 3 | Accele | eration Driv          | vers Overvie | 9W                                                                                   | 22 |
|   | 3.1    | Hardwa                | re/Software  | Overview                                                                             | 22 |
|   | 3.2    | Accelera              | ation Driver | Configuration File                                                                   | 24 |
|   | 3.3    | Utility fo            | or Loading C | onfiguration Files and Sending Events to the Driver - adf_ctl                        | 24 |
|   |        | 3.3.1                 | 2            |                                                                                      |    |
|   |        | 3.3.2                 |              |                                                                                      |    |
|   | 3.4    |                       | -            | Memory Allocation                                                                    |    |
|   |        | 3.4.1                 | •            | pecific USDM                                                                         |    |
|   | 3.5    |                       |              | nal Functions                                                                        |    |
|   | 3.6    | -                     |              | ۲ Endpoints Using qat_service                                                        |    |
|   | 3.7    |                       | -            | bugfs entries                                                                        |    |
|   |        | 3.7.1                 |              | /sys/kernel/debug/qat_*                                                              |    |
|   |        | 3.7.2                 |              | lriver queries ( <i>qae_mem_slabs</i> )                                              |    |
|   | 3.8    | •                     |              | s Codes                                                                              |    |
|   |        | 3.8.1                 | -            | ۲ Compression API Errors                                                             |    |
|   | 3.9    |                       | -            | on Unsupported                                                                       |    |
|   | 3.10   | Stateles              | •            | ion Level Details                                                                    |    |
|   |        | 3.10.1                |              | sion Level Mapping                                                                   |    |
|   |        | 3.10.2                |              | on History Buffer Size (aka Deflate Window Size)                                     |    |
|   | 3.11   |                       |              | Return Codes                                                                         |    |
|   | 3.12   |                       |              | npression Unsupported                                                                |    |
|   | 3.13   | -                     |              | y Feature                                                                            |    |
|   | 3.14   | -                     |              | ns as Non-Root User                                                                  |    |
|   | 3.15   |                       |              | eneration                                                                            |    |
|   | 3.16   | -                     | -            | Included Memory Driver                                                               |    |
|   | 3.17   |                       |              |                                                                                      |    |
|   |        | 3.17.1                |              | t Operation                                                                          |    |
|   |        |                       | 3.17.1.1     | Initialization                                                                       |    |
|   |        |                       | 3.17.1.2     | Heartbeat Monitoring                                                                 |    |
|   |        | 3.17.2                | 3.17.1.3     | Resetting a Failed Device<br>ting Heartbeat into Intel <sup>®</sup> QAT Applications |    |
|   |        | 3.17.2<br>3.17.3      |              | eartbeat                                                                             |    |
|   |        | 0.17.0                | 3.17.3.1     | Simulated Heartbeat Failure Configuration                                            |    |
|   |        |                       | 3.17.3.2     | Simulating Heartbeat Failure                                                         |    |
|   |        |                       |              |                                                                                      |    |

|       |                         | 3.17.3.3      | System Virtual Files                                                 |    |
|-------|-------------------------|---------------|----------------------------------------------------------------------|----|
|       |                         | 3.17.3.4      | Heartbeat Polling Frequencies                                        |    |
| 3.18  |                         | -             | ilures in a Virtualized Environment                                  |    |
|       | 3.18.1                  |               | nding System Messages and Warnings                                   |    |
| 3.19  | Incorpor                |               | ny Responses into an Intel $^{\circ}$ QAT Application                |    |
|       | 3.19.1                  |               | ، Availability, Serviceability                                       |    |
|       | 3.19.2                  |               | d Data Integrity Support in QAT 1.8:                                 |    |
| 3.20  | Rate Lin                | -             |                                                                      |    |
|       | 3.20.1                  | Service L     | evel Agreement (SLA)                                                 |    |
|       | 3.20.2                  |               | S                                                                    |    |
|       | 3.20.3                  |               | ager Application                                                     |    |
|       |                         | 3.20.3.1      | Rate Limiting Commands                                               |    |
| 3.21  | DU Man                  |               | ation                                                                |    |
|       | 3.21.1                  |               | ds to Fetch Device Utilization                                       |    |
|       | 3.21.2                  |               | 5                                                                    |    |
|       | 3.21.3                  |               | e Algorithm                                                          |    |
| 3.22  | -                       |               |                                                                      |    |
| 3.23  | Access                  | to Legacy A   | lgorithms                                                            | 51 |
| Accel | eration Driv            | ver Configu   | ration File                                                          | 55 |
| 4.1   | Configu                 | ration File C | Overview                                                             | 55 |
| 4.2   | General                 | Section       |                                                                      |    |
|       | 4.2.1                   |               | arameters                                                            |    |
| 4.3   | Logical I               |               | ection                                                               |    |
|       | 4.3.1                   |               | ] Section                                                            |    |
|       |                         | 4.3.1.1       | Enabling Linux* Kernel Crypto Framework (LKCF)                       |    |
|       | 4.3.2                   | [KERNEL       | QAT] Section                                                         |    |
|       | 4.3.3                   | User Proc     | cess [xxxxx] Sections                                                | 61 |
|       |                         | 4.3.3.1       | Maximum Number of Process Calculations                               | 62 |
|       |                         | 4.3.3.2       | Increasing the Maximum Number of Processes/Instances.                | 62 |
|       |                         | 4.3.3.3       | Configuring Instances for Virtual Functions                          | 63 |
|       | 4.3.4                   | Cryptogr      | aphic Logical Instance Parameters                                    |    |
|       |                         | 4.3.4.1       | LKCF-supported algorithms:                                           |    |
|       | 4.3.5                   |               | npression Logical Instance Parameters                                |    |
|       | 4.3.6                   | -             | ne Core Affinity Parameter for a Logical Instance                    |    |
| 4.4   | -                       |               | e Intel® QAT Endpoints in a System                                   |    |
| 4.5   | -                       |               | e Processes on a System with Multiple Intel $^{\circ}$ QAT Endpoints |    |
| 4.6   | Sample                  | Configurati   | on File                                                              | 71 |
| Secur | e Architec <sup>.</sup> | ture Consid   | erations                                                             | 72 |
| 5.1   | Termino                 | ology         |                                                                      | 72 |
|       | 5.1.1                   | Threat Ca     | ategories                                                            | 72 |
|       | 5.1.2                   | Attack M      | echanism                                                             | 73 |
|       | 5.1.3                   |               | Privilege                                                            |    |
|       | 5.1.4                   | Deploym       | ent Models                                                           | 74 |
| 5.2   | Threat/                 |               | ors                                                                  |    |
|       | 5.2.1                   |               | 1itigation                                                           |    |
|       | 5.2.2                   |               | hreats                                                               |    |
|       |                         | 5.2.2.1       | DMA                                                                  |    |
|       |                         | 5.2.2.2       | Intentional Modification of IA Driver                                | 75 |

4

|      |            | 5.2.2.3            | Modification of the QAT Configuration File               | 76  |
|------|------------|--------------------|----------------------------------------------------------|-----|
|      |            | 5.2.2.4            | Malicious Application Code                               | 76  |
|      |            | 5.2.2.5            | Denial of Service                                        | 76  |
|      | 5.2.3      | Threats \$         | Specific to Cryptographic Service                        | 77  |
|      |            | 5.2.3.1            | Reading Cryptographic Keys                               | 77  |
| Supp | orted APIs |                    |                                                          | 78  |
| 6.1  |            |                    |                                                          |     |
| 0.1  | 6.1.1      |                    | T API Limitations                                        |     |
|      | 0.1.1      | 6.1.1.1            | Resubmitting After Getting an Overflow Error             |     |
|      |            | 6.1.1.2            | Dynamic Compression for Data Compression Service         |     |
|      |            | 6.1.1.3            | Maximal Expansion with Auto Select Best Feature for      |     |
|      |            | 0.1.1.0            | Compression                                              | 83  |
|      |            | 6.1.1.4            | Maximal Expansion and Destination Buffer Size in Compre  |     |
|      |            |                    | Direction                                                |     |
|      | 6.1.2      | Data Plaı          | ne APIs Overview                                         | 85  |
|      |            | 6.1.2.1            | IA Cycle Count Reduction When Using Data Plane APIs      |     |
|      |            | 6.1.2.2            | Usage Constraints on the Data Plane APIs                 |     |
|      |            | 6.1.2.3            | Cryptographic and Data Compression API Descriptions      |     |
|      | 6.1.3      | Recoveri           | ng from a Compress and Verify Error                      |     |
|      | 6.1.4      |                    | Recovered Compression Errors                             |     |
|      | 6.1.5      |                    | ss and Verify Error log in Sysfs:                        |     |
|      | 6.1.6      |                    | ed Algorithms in LKCF                                    |     |
| 6.2  |            |                    |                                                          |     |
| 0.2  | 6.2.1      |                    | Remapping Functions                                      |     |
|      | 0.2.1      | 6.2.1.1            | icp_sal_iommu_get_remap_size                             |     |
|      |            | 6.2.1.1            | icp_sal_ionmu_map                                        |     |
|      |            | 6.2.1.2            | icp_sal_ionmu_unmap                                      |     |
|      |            | 6.2.1.3            | IOMMU Remapping Function Usage                           |     |
|      | 6.2.2      | -                  | unctions                                                 |     |
|      | 0.2.2      | 6.2.2.1            | icp_sal_pollBank                                         |     |
|      |            | 6.2.2.1            | icp_sal_pollAllBanks                                     |     |
|      |            | 6.2.2.2            | icp_sal_CyPollInstance                                   |     |
|      |            | 6.2.2.3            | icp_sal_CcPollInstance                                   |     |
|      |            | 6.2.2.4<br>6.2.2.5 | icp_sal_CyPollDpInstance                                 |     |
|      |            | 6.2.2.5            | icp_sal_CyPoliDpInstance                                 |     |
|      | 6.2.3      |                    | ace Access Configuration Functions                       |     |
|      | 0.2.0      | 6.2.3.1            | icp_sal_userStart                                        |     |
|      |            | 6.2.3.1            | icp_sal_userStop                                         |     |
|      | 6.2.4      |                    | nformation Function                                      |     |
|      | 0.2.4      | 6.2.4.1            | icp_sal_getDevVersionInfo                                |     |
|      | 6.2.5      | -                  | evice Function                                           |     |
|      | 0.2.5      | 6.2.5.1            | icp_sal_reset_device                                     |     |
|      | 6.2.6      |                    | ess APIs                                                 |     |
|      | 0.2.0      | 6.2.6.1            | icp_sal_poll_device_events                               |     |
|      |            | 6.2.6.2            | icp_sal_find_new_devices                                 |     |
|      | 6.2.7      |                    | ss and Verify (CnV) Related APIs                         |     |
|      | 0.2.7      | 6.2.7.1            | icp_sal_dc_get_dc_error()                                |     |
|      |            | 6.2.7.1            | icp_sal_dc_get_dc_error()<br>icp_sal_dc_simulate_error() |     |
|      | 6.2.8      | -                  | tcp_sal_dc_simulate_error()                              |     |
|      | 0.Z.Ö      |                    |                                                          |     |
|      |            | 6.2.8.1            | icp_sal_check_device()                                   | 103 |

6

7

8

|        |                                                           | 6.2.8.2 icp_sal_check_all_devices()                              |       |
|--------|-----------------------------------------------------------|------------------------------------------------------------------|-------|
|        |                                                           | 6.2.8.3 icp_sal_heartbeat_simulate_failure()                     |       |
|        | 6.2.9                                                     | Device Polling APIs                                              |       |
|        |                                                           | 6.2.9.1 icp_sal_poll_device_events()                             |       |
|        |                                                           | 6.2.9.2 cpaCyInstanceSetNotificationCb                           |       |
|        |                                                           | 6.2.9.3 cpaDcInstanceSetNotificationCb                           |       |
|        | 6.2.10                                                    | Congestion Management APIs                                       |       |
|        |                                                           | 6.2.10.1 icp_sal_SymGetInflightRequests                          |       |
|        |                                                           | 6.2.10.2 icp_sal_AsymGetInflightRequests                         |       |
|        |                                                           | 6.2.10.3 icp_sal_dp_SymGetInflightRequests                       |       |
|        | 6.2.11                                                    | Service Specific Polling APIs                                    |       |
|        |                                                           | 6.2.11.1 icp_sal_CyPollSymRing                                   |       |
|        |                                                           | 6.2.11.2 icp_sal_CyPollAsymRing                                  |       |
|        | 6.2.12                                                    | Check Device Availability APIs                                   |       |
|        |                                                           | 6.2.12.1 icp_sal_userlsQatAvailable                              |       |
| Applic | cation Usa                                                | ge Guidelines                                                    |       |
| 7.1    |                                                           | g Service Instances to Engines on the Intel® QAT Endpoint        |       |
|        | 7.1.1                                                     | Processor and Intel® QAT Endpoint Communication                  |       |
|        | 7.1.2                                                     | Service Instances and Interaction with the Hardware              |       |
|        | 7.1.3                                                     | Service Instance Configuration                                   |       |
|        | 7.1.4                                                     | Cryptographic Load Balancing Using Multiple Intel® QAT Instances |       |
| 7.2    | Cryptoc                                                   | graphy Applications                                              |       |
|        | 7.2.1                                                     | IPsec and SSL VPNs                                               |       |
|        | 7.2.2                                                     | Encrypted Storage                                                |       |
|        | 7.2.3                                                     | Web Proxy Appliances                                             |       |
| 7.3    | Data Co                                                   | ompression Applications                                          | ••••• |
|        | 7.3.1                                                     | Compression for Storage                                          |       |
|        | 7.3.2                                                     | Data Deduplication and WAN Acceleration                          |       |
| Black  | Box Debug                                                 | g Tool                                                           |       |
| 8.1    | Introduc                                                  | ction                                                            |       |
|        |                                                           |                                                                  |       |
|        | 8.1.1                                                     | Overview                                                         |       |
|        | 8.1.1                                                     |                                                                  |       |
|        | 8.1.1                                                     |                                                                  |       |
| 8.2    | -                                                         | 8.1.1.1 Security Considerations                                  |       |
| 8.2    | -                                                         | 8.1.1.1       Security Considerations                            |       |
| 8.2    | Detailed                                                  | 8.1.1.1       Security Considerations                            |       |
| 8.2    | Detailed                                                  | 8.1.1.1       Security Considerations                            |       |
| 8.2    | Detailed                                                  | 8.1.1.1       Security Considerations                            |       |
| 8.2    | Detaileo<br>8.2.1                                         | 8.1.1.1       Security Considerations                            |       |
| 8.2    | Detaileo<br>8.2.1                                         | 8.1.1.1       Security Considerations                            |       |
| 8.2    | Detaileo<br>8.2.1                                         | 8.1.1.1       Security Considerations                            |       |
| 8.2    | Detaileo<br>8.2.1                                         | 8.1.1.1       Security Considerations                            |       |
| -      | Detaileo<br>8.2.1<br>8.2.2                                | 8.1.1.1       Security Considerations                            |       |
| -      | Detaileo<br>8.2.1<br>8.2.2                                | 8.1.1.1       Security Considerations                            |       |
| 8.2    | Detaileo<br>8.2.1<br>8.2.2<br>Installat                   | 8.1.1.1       Security Considerations                            |       |
| -      | Detaileo<br>8.2.1<br>8.2.2<br>Installat<br>8.3.1          | 8.1.1.1       Security Considerations                            |       |
| -      | Detailed<br>8.2.1<br>8.2.2<br>Installat<br>8.3.1<br>8.3.2 | 8.1.1.1       Security Considerations                            |       |

|     | 8.4.1    | Configur   | ation via QAT Device Configuration Files              | 127      |
|-----|----------|------------|-------------------------------------------------------|----------|
|     | 8.4.2    | Configur   | ation via sysfs                                       | 128      |
|     | 8.4.3    |            | g Current Configuration Used by Driver                |          |
| 8.5 | Usage E: | xamples    |                                                       | 131      |
|     | 8.5.1    | Collectin  | g Data – Sanity Check                                 | 131      |
|     |          | 8.5.1.1    | Continuous Sync Enabled                               |          |
|     |          | 8.5.1.2    | Continuous Sync Disabled                              | 131      |
|     | 8.5.2    | Audit Ph   | ysical Addresses – Sanity Check                       | 132      |
|     |          | 8.5.2.1    | Emulate Uncorrectable Error                           | 132      |
|     |          | 8.5.2.2    | Continuous Sync Enabled                               | 133      |
|     |          | 8.5.2.3    | Continuous Sync Disabled (Crash Dump Based)           | 135      |
|     | 8.5.3    | Audit Cip  | oher Buffers Alignment – Sanity Check                 | 138      |
|     |          | 8.5.3.1    | Emulate Slice Hang Caused by Incorrect Buffers Alignr | nents138 |
|     |          | 8.5.3.2    | Slice Hang Handling with Continuous Sync Enabled      | 139      |
|     |          | 8.5.3.3    | Slice Hang Handling with Continuous Sync Disabled     | 141      |
|     | 8.5.4    | Audit Re   | turn Codes                                            | 144      |
|     |          | 8.5.4.1    | Audit Return Codes – Continuous Sync Option           | 144      |
| 8.6 | SR-IOV.  |            |                                                       | 146      |
|     | 8.6.1    | Build inst | tructions                                             | 146      |
|     | 8.6.2    | Usage      |                                                       | 148      |
| 8.7 | Program  | ming Guid  | le                                                    | 148      |
|     | 8.7.1    | Physical   | to Virtual Translation Callback                       |          |
|     |          |            |                                                       |          |

### Figures

| Figure 1. | Kernel Space Response Ring Processing                           |  |
|-----------|-----------------------------------------------------------------|--|
| Figure 2. | Intel® C62x Chipset (PCH) Acceleration Endpoint Configuration 1 |  |
| Figure 3. | Intel® C62x Chipset (PCH) Acceleration Endpoint Configuration 2 |  |
| Figure 4. | Incorporating Dummy Responses in an Intel® OAT Operation        |  |
| Figure 5. | Dynamic Compression Data Path                                   |  |
| Figure 6. | Amortizing the Cost of an MMIO Across Multiple Requests         |  |
| Figure 7. | Service Instance Configuration                                  |  |
| Figure 8. | Data Collection Architecture                                    |  |
| Figure 9. | Typical Crash Dump Scenario                                     |  |

### Tables

| Table 1.  | Terminology                                                          | 14 |
|-----------|----------------------------------------------------------------------|----|
| Table 2.  | Reference Documents and Resources                                    | 17 |
| Table 3.  | Services                                                             | 25 |
| Table 4.  | Intel® QuickAssist Technology /sys/kernel/debug Entries              |    |
| Table 5.  | gae_mem_slabs Commands Supported                                     |    |
| Table 6.  | Intel® QAT Compression API Errors                                    |    |
| Table 7.  | Compression Levels for QAT 1.7 Hardware                              |    |
| Table 8.  | Compression Levels for QAT 1.8 Hardware                              |    |
| Table 9.  | Acceleration Driver Return Codes                                     | 35 |
| Table 10. | Acceleration Driver Return Codes for Linux* Device Driver Operations | 35 |
| Table 11. | AutoResetOnError Values                                              |    |
| Table 12. | Heartbeat System Virtual Files                                       |    |
| Table 13. | Supported Legacy Algorithms                                          |    |
|           |                                                                      |    |

| Table 14. | General Default Configuration Parameters             | 56 |
|-----------|------------------------------------------------------|----|
| Table 15. | General Parameters                                   | 57 |
| Table 16. | [KERNEL] Section Parameters                          | 60 |
| Table 17. | [KERNEL_QAT] Section Parameters                      | 60 |
| Table 18. | [KERNEL_QAT] Section Parameters                      | 61 |
| Table 19. | Configuring Physical Functions and Virtual Functions | 64 |
| Table 20. | Cryptographic Logical Instance Parameters            |    |
| Table 21. | Data Compression Logical Instance Parameters         | 66 |
| Table 22. | System Threat Categories                             | 72 |
| Table 23. | Attack Mechanisms and Examples                       |    |
| Table 24. | Attacker Privilege                                   | 73 |
| Table 25. | Deployment Models                                    | 74 |
| Table 26. | Compression/Decompression Overflow Behavior          | 81 |
| Table 27. | API Support for Compress and Verify and Recover      | 88 |

## Revision History

| Document<br>Number | Revision<br>Number | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                  | Revision Date  |
|--------------------|--------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------|
| 336210             | 029                | <ul> <li>Updates for Intel<sup>®</sup> QAT Programmers Guide – Customer Enabling<br/>Release:</li> <li>Updated <u>Thread Specific USDM</u></li> <li>Added <u>Enabling Linux* Kernel Crypto Framework</u><br/>(LKCF)</li> <li>Added <u>3.18.1: Understanding System Messages and</u><br/><u>Warnings</u></li> <li>Updated Section <u>3.10.1: Compression Level Mapping</u></li> <li>Added note about enabling Black Box Debug Tool<br/>(BBDT) to Virtual Functions</li> </ul> | December 2023  |
| 336210             | 028                | <ul> <li>Updates for Intel<sup>®</sup> QAT Programmers Guide – Customer Enabling<br/>Release:</li> <li>Updated Section 6.1.1: Intel<sup>®</sup> QAT API Limitations with<br/>updated Guidance on. Auto-Select-Best feature (ASB,<br/>i.e. CPA_DC_ASB_ENABLED)</li> <li>Updated Table 11: Access to Legacy Algorithms with<br/>Opt-In PKE Algorithms</li> <li>Updated Section 3.21: DU Manager Application for<br/>clarity</li> </ul>                                         | May 2023       |
| 336210             | 027                | Updates for Intel <sup>®</sup> QAT Programmers Guide – Customer Enabling<br>Release, with Release v4.21:<br>• Added <u>3.4.1: Thread Specific USDM</u>                                                                                                                                                                                                                                                                                                                       | March 2023     |
| 336210             | 026                | Updates for Intel <sup>®</sup> QAT Programmers Guide – Customer Enabling<br>Release:<br>Updated <u>Section 3.7: Overview of QAT debugfs</u><br><u>entries</u> to include new <u>Section 3.7.2: Memory driver</u><br><u>queries ( qae_mem_slabs )</u><br>Updated Legal Notices & Disclaimers                                                                                                                                                                                  | February 2023  |
| 336210             | 025                | Updates for Intel <sup>®</sup> QAT Programmers Guide – Customer Enabling<br>Release:<br>• Name change, now supports 1.8 HW Gen lookaside<br>features (non-inline)                                                                                                                                                                                                                                                                                                            | December 2022  |
| 336210             | 024                | Updates for Intel <sup>®</sup> QAT Programmers Guide Hardware Version 1.7:<br>• Added <u>Section 3.22 Access to Legacy Algorithms</u>                                                                                                                                                                                                                                                                                                                                        | September 2022 |

| Document<br>Number | Revision<br>Number | Description                                                                                                                                                                                                                                                                                                                                                                                                                                               | Revision Date |
|--------------------|--------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------|
|                    |                    | Removed Section 6.2.1 Dynamic Instance Allocation Functions, as it is unsupported                                                                                                                                                                                                                                                                                                                                                                         |               |
| 336210             | 023                | <ul> <li>Updates for Intel<sup>®</sup> QAT Programmers Guide Hardware Version 1.7:</li> <li>Updated Chapter 8: Black Box Debug Tool, changed numeration, added SR-IOV section, update report tool behavior</li> <li>Updated Sections 7.2.2: Encrypted Storage, &amp; 7.3: Data Compression Applications with minor edits to terminology and grammatical updates for clarity</li> </ul>                                                                    | April 2022    |
| 336210             | 022                | <ul> <li>Updates for Intel<sup>®</sup> QAT Programmers Guide Hardware Version 1.7:</li> <li>Updated Section 2.2.1.3: changed first note on epoll mode to improve clarity.</li> <li>Updated Section 3.16 Huge Pages with the Included Memory Driver</li> <li>Updated with IntelOne font</li> </ul>                                                                                                                                                         | March 2022    |
| 336210             | 021                | <ul> <li>Updates for Intel<sup>®</sup> QAT Programmers Guide Hardware Version 1.7:</li> <li>Updated Table 1: Terminology table</li> <li>Updated Section 3.3.1 Usage</li> <li>Added Section 3.3.2 Examples</li> <li>Updated Section 6.2 Additional APIs</li> <li>Added Section 6.2.13 Check Device Availability APIs</li> <li>Updated formatting on Section 6.2 Additional APIs</li> <li>Added Chapter 8: Black Box Debug Tool</li> </ul>                  | December 2021 |
| 336210             | 020                | <ul> <li>Updates for Intel<sup>®</sup> QAT Programmers Guide Hardware Version 1.7:</li> <li>Updated Section 3.14 Running Applications as Non-Root User</li> </ul>                                                                                                                                                                                                                                                                                         | November 2021 |
| 336210             | 019                | <ul> <li>Updates for Intel<sup>®</sup> QAT Programmers Guide Hardware Version 1.7:</li> <li>Updated Section 3.20 Rate Limiting</li> <li>Updated Section 3.21 DU Manager Application</li> <li>Updated Section 3.21.1 Commands to Fetch Device<br/>Utilization</li> <li>Updated formatting in Section 3.21.2 Durations</li> <li>Updated Section 4.2.1 General Parameters</li> <li>Updated Table 10. General Default Configuration<br/>Parameters</li> </ul> | October 2021  |

| Document<br>Number | Revision<br>Number | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      | Revision Date  |
|--------------------|--------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------|
|                    |                    | <ul> <li>Updated formatting for Table 11. General Parameters,<br/>Table 12. [KERNEL] Section Parameters, Table 13.<br/>[KERNEL_QAT] Section Parameters, Table 14.<br/>[KERNEL_QAT] Section Parameters, Table 15.<br/>Configuring Physical Functions and Virtual Functions,<br/>Table 16. Cryptographic Logical Instance Parameters</li> </ul>                                                                                                                                                                                                    |                |
| 336210             | 018                | <ul> <li>Updates for Intel<sup>®</sup> QAT Programmers Guide Hardware Version 1.7:</li> <li>Section 3.20 Rate Limiting</li> <li>Section 3.20.1 Service Level Agreement (SLA)</li> <li>Section 3.20.2 SLA Units</li> <li>Section 3.20.3.1 Commands to Fetch Device Utilization</li> <li>Section 3.21 DU Manager Application</li> <li>Section 3.21.1 Commands to Fetch Device Utilization</li> <li>Section 6.1.1.3.5 CPA_DC_ASB_ENABLED</li> <li>Section 6.1.1.4 Maximal Expansion and Destination Buffer Size in compression direction</li> </ul> | September 2021 |
| 336210             | 017                | <ul> <li>Updates for Intel<sup>®</sup> QAT Programmers Guide Hardware Version 1.7</li> <li>Clarified "exception" vs "error" in the Section 6.1.1.1 title.</li> <li>Add new section 6.1.1.5: "Avoiding a Compression Overflow exception"</li> <li>Updated Table 10 under Section 4.2.1 ("General Parameters") to reflect a new capability of hashing being available with ServicesProfile = COMPRESSION</li> </ul>                                                                                                                                | May 2021       |
| 336210             | 016                | Updated guidance to enable rate limiting                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         | March 2021     |
| 336210             | 015                | Updates for Intel® QAT Programmers Guide Hardware Version 1.7<br>Added new sections:<br>• Section 5 Secure Architecture Considerations                                                                                                                                                                                                                                                                                                                                                                                                           | December 2020  |
| 336210             | 014                | Updates for Intel® QAT Programmers Guide Hardware Version 1.7<br>Added new sections:<br>• Section 6.2.11 Congestion Management APIs<br>• Section 6.2.12 Service Specific Polling APIs                                                                                                                                                                                                                                                                                                                                                            | September 2020 |
| 336210             | 013                | Updates for Intel® QAT software v4.10.0 release:<br>• Revised Note in Section 2.2.1.3                                                                                                                                                                                                                                                                                                                                                                                                                                                            | June 2020      |

| Document<br>Number | Revision<br>Number | Description                                                                                                                                                                                  | Revision Date  |
|--------------------|--------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------|
|                    |                    | <ul> <li>Section 3.8.1 Added Note before Table 5</li> <li>Revised Table 10</li> <li>Table 11, removed StorageEnabled and<br/>PkeServiceDisabled parameter</li> </ul>                         |                |
| 336210             | 012                | Updates for Intel® QAT software v4.8.0 release:<br>Revised Section Chapter, 3.4 Application Payload<br>Memory Allocation                                                                     | February 2020  |
| 336210             | 011                | <ul> <li>Updated:</li> <li>Permissions for using huge pages with included memory driver</li> <li>Rate limiting and device utilization measurement impacts performance when active</li> </ul> | November 2019  |
| 336210             | 010                | <ul> <li>Updates for 4.7.0 release:</li> <li>Added virtual functions to list of configurable instances</li> <li>Rate limiting and device utilization measurement features</li> </ul>         | October 2019   |
| 336210             | 009                | Updated configuration options for concurrent requests (Tables 10, 14, 15)                                                                                                                    | July 2019      |
| 336210             | 008                | Updates for 4.6.0 release:<br>Dummy responses added to Heartbeat feature<br>Handling device failures in a ritualized environment                                                             | June 2019      |
| 336210             | 007                | Updates for 4.5.0 release: <ul> <li>Updated list of general parameters</li> <li>Updated list of Intel<sup>®</sup>QuickAssist entries in /sys/kernel/debug</li> </ul>                         | March 2019     |
| 336210             | 006                | Updates for 4.4.0 release:<br>• Updated list of Compression API Errors                                                                                                                       | December 2018  |
| 336210             | 005                | Updates for 4.3.0 release:<br>Intel <sup>®</sup> QuickAssist API in kernel space<br>Added epoll content<br>Updates for the Compress and Verify and Recover<br>feature<br>Other minor changes | September 2018 |

| Document<br>Number | Revision<br>Number | Description                                                                                                  | Revision Date |
|--------------------|--------------------|--------------------------------------------------------------------------------------------------------------|---------------|
| 336210             | 004                | Added description of Compress and Verify and Recover (CnVnR) capability.                                     | June 2018     |
| 336210             | 003                | Added Heartbeat description. Clarified explanations of stateless and stateful compression and decompression. | April 2018    |
| 336210             | 002                | Stateful compression is no longer supported by default.                                                      | April 2018    |
| 336210             | 001                | Initial public release.                                                                                      | August 2017   |

§



## 1 Introduction

This programmer's guide provides information on the architecture of the software and usage guidelines., information on the use of Intel<sup>®</sup> QuickAssist Technology (Intel<sup>®</sup> QAT) APIs, which provide the interface to the acceleration services (cryptographic and data compression), is documented in the related Intel<sup>®</sup> QAT software library documentation (refer to <u>Table 2</u>).

### 1.1 Terminology

In this document, for convenience:

- The software package is used as a generic term for the Intel® QAT software package for Hardware Version 1.7.
- Acceleration driver is used as a generic term for the software that allows the Intel<sup>®</sup> QAT Software Library APIs to access the Intel<sup>®</sup> QAT Endpoint(s).

#### Table 1.Terminology

| Term  | Description                             |
|-------|-----------------------------------------|
| ADF   | Acceleration Driver Framework           |
| AE    | Acceleration Engine                     |
| AES   | Advanced Encryption Standard            |
| ASIC  | Application Specific Integrated Circuit |
| AU    | Acceleration Unit                       |
| BDF   | Bus Device Function                     |
| BMSM  | Broad Market Switch Mode                |
| BnP   | Batch and Pack                          |
| BTS   | Base Transceiver Station                |
| СВС   | Cipher Block Chaining mode              |
| ССМ   | Counter with CBC-MAC mode               |
| CnV   | Compress and Verify                     |
| CnVnR | Compress and Verify and Recover         |
| СРК   | Columbia Park                           |
| CY    | Cryptography                            |
| DC    | Data Compression                        |
| DID   | Device ID                               |
| DMA   | Direct Memory Access                    |

| Term                   | Description                                                                                                                                                    |
|------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------|
| DPDK                   | Data Plane Development Kit                                                                                                                                     |
| DRAM                   | Dynamic Random-Access Memory                                                                                                                                   |
| DSA                    | Digital Signature Algorithm                                                                                                                                    |
| DTLS                   | Datagram Transport Layer Security                                                                                                                              |
| ECC                    | Elliptic Curve Cryptography                                                                                                                                    |
| EVP                    | Envelope (OpenSSL* high-level cryptographic functions)                                                                                                         |
| FW                     | Firmware                                                                                                                                                       |
| GCM                    | Galois/Counter Mode                                                                                                                                            |
| GPL                    | General Public License                                                                                                                                         |
| HLP                    | Highland Park                                                                                                                                                  |
| HMAC                   | Hash-based Message Authentication Mode                                                                                                                         |
| IA                     | Intel <sup>®</sup> Architecture                                                                                                                                |
| I/O                    | Input/Output                                                                                                                                                   |
| IDC                    | Inter Driver Communication                                                                                                                                     |
| IDS/IPS                | Intrusion Detection System/Intrusion Prevention System                                                                                                         |
| IEEE                   | Institute of Electrical and Electronics Engineers                                                                                                              |
| IKE                    | Internet Key Exchange                                                                                                                                          |
| Intel <sup>®</sup> QAT | Intel <sup>®</sup> QuickAssist Technology                                                                                                                      |
| IOCTL                  | Input Output Control function                                                                                                                                  |
| IOMMU                  | Input-Output Memory Management Unit                                                                                                                            |
| IOSF-SB                | Intel® On-chip System Fabric Side Band                                                                                                                         |
| IPSec                  | Internet Protocol Security                                                                                                                                     |
| LKCF                   | Linux* Kernel Cryptographic Framework                                                                                                                          |
| LTTng                  | Linux* Trace Toolkit Next Generation                                                                                                                           |
| MGF                    | Mask Generation Function                                                                                                                                       |
| MSI                    | Message Signaled Interrupts                                                                                                                                    |
| NAC                    | Network Acceleration Complex                                                                                                                                   |
| NUMA                   | Non-uniform Memory Access                                                                                                                                      |
| OP Data                | Operational Data                                                                                                                                               |
| PCH                    | Platform Controller Hub. In this manual, a Platform Controller Hub device includes standard interfaces and Intel <sup>®</sup> QAT Endpoint and I/O interfaces. |

| Term   | Description                           |
|--------|---------------------------------------|
| PCI    | Peripheral Connect Interface          |
| PCle*  | PCI Express*                          |
| PF     | Physical Function                     |
| РКЕ    | Public Key Encryption                 |
| RSA    | Rivest-Shamir-Adleman                 |
| RTE    | Run-Time Environment                  |
| SA     | Security Association                  |
| SADB   | Security Association Database         |
| SAL    | Service Access Layer                  |
| SATA   | Serial Advanced Technology Attachment |
| SGL    | Scatter-Gather List                   |
| SHA    | Secure Hash Algorithm                 |
| SKU    | Stock Keeping Unit                    |
| SoC    | System-on-a-Chip                      |
| SPI    | Serial Peripheral Interconnect        |
| SR-IOV | Single Root I/O Virtualization        |
| SSC    | Storage Subsystem Class               |
| SSL    | Secure Sockets Layer                  |
| SYM    | Symmetric Crypto                      |
| TCG    | Trusted Computing Group               |
| TLS    | Transport Layer Security              |
| ТРМ    | Trusted Platform Module               |
| USDM   | User Space DMA-able Memory            |
| VF     | Virtual Function                      |
| VPN    | Virtual Private Network               |
| WAN    | Wide Area Network                     |
| WQM    | Work Queue Manager                    |



#### Table 2. Reference Documents and Resources

| Document                                                                                                  | Document Number/<br>Location |
|-----------------------------------------------------------------------------------------------------------|------------------------------|
| Intel <sup>®</sup> QuickAssist Technology Software for Linux* CE Release Notes                            | 336211                       |
| Intel® QuickAssist Technology Software for Linux* Release Notes<br>(Hardware Version 1.8 for In-line)     | 613775                       |
| Intel <sup>®</sup> QuickAssist Technology Software for Linux* CE Getting Started                          | 336212                       |
| Intel <sup>®</sup> QuickAssist Technology API Programmer's Guide                                          | 330684                       |
| Intel <sup>®</sup> QuickAssist Technology Cryptographic API Reference Manual                              | 330685                       |
| Intel <sup>®</sup> QuickAssist Technology Data Compression API Reference Manual                           | 330686                       |
| Using Intel® Virtualization Technology (Intel® VT) with Intel® QuickAssist<br>Technology Application Note | 330689                       |

### 1.2 Typographical Conventions

The following font conventions are used in this manual:

- Courier font file names, path names, executables, code examples, command line entries, API names, parameter names, and other programming constructs
- Italic text key terms and publication titles
- **Bold text** graphical user interface entries, buttons, keyboard keys and Intel<sup>®</sup> software names





## 2 Software Overview

In addition to the hardware mentioned in <u>Section 3.1, Hardware/Software Overview</u>, the respective platforms have critical software components that are part of the offering. The software includes drivers and acceleration code that runs on the Intel<sup>®</sup> Architecture (IA) CPUs and Intel<sup>®</sup> QAT Endpoints.

### 2.1 Intel<sup>®</sup> Communications Chipset 8925 to 8955 Series Compatibility

While the focus of this document is on Intel<sup>®</sup> QAT software for Hardware Version 1.7, the Intel<sup>®</sup> Communications Chipset 8925 to 8955 Series is also supported.

### 2.2 Logical Instances

A logical instance may be thought of as a channel to the hardware. A logical instance allows an address domain (that is, kernel space and individual user space processes) to configure the rings to be used by that address domain and to define the behavior of that ring.

#### 2.2.1 Response Processing

In the kernel space, each logical instance can be configured to operate in one of the two modes:

- Interrupt mode
- Polled mode

In the user space, each logical instance can be configured to operate in one of the two modes:

- Polled mode
- Epolled mode

#### 2.2.1.1 Interrupt Mode

The interrupt is only supported in Kernel space. In User space it is no longer supported; therefore, the user space instance can no longer be configured with interrupt enabled mode.

When configured in interrupt mode, the Accelerator Driver Framework (ADF) registers an interrupt handler for response ring processing.

As the latency in servicing an interrupt may be costly, the hardware-assisted ring provides a mechanism to amortize the cost of interrupts into a single interrupt that may service multiple responses. The interrupt coalescing section of the configuration file allows the user to select the mechanism to amortize response interrupts using either a time-based interrupt scheme or a number-of-responses-based scheme.



The ADF registers an interrupt handler to service the ring bank interrupt. When an interrupt fires, the ADF services the interrupt and creates an interrupt handler bottom half to consume the responses from the response ring. When MSI-X is supported, the bottom half of the interrupt handler is created and affinitized to the configured core. Callbacks to the application code occur in the context of this taskset. This sequence is shown in the following figure (the full sequence has been reduced for clarity).

**NOTE:** Linux\* (and other operating systems) split an interrupt handler into two halves. The so-called "top half" is the routine that responds to the interrupt, that is, the one you register with request\_irq. The "bottom half" is a routine that is scheduled by the top half to be executed later, at a safer time.

#### Figure 1. Kernel Space Response Ring Processing



If the cost of servicing an interrupt and scheduling the interrupt handler bottom half is not desired, a user can choose to disable interrupts and poll for responses. This mechanism can be configured on a per logical instance basis by setting the Dc/CyXIsPolled attribute of a logical instance in the configuration file to 1. When configured to 1, the ADF does not service interrupts for that logical instance.

The ADF provides a set of APIs to allow the client to poll a single bank or all banks on a given accelerator:

- icp\_sal\_pollBank Poll the rings on the given bank number for a given accelerator.
- icp\_sal\_pollAllBanks Poll the rings on all banks for a given accelerator.

The Service Access Layer (SAL) provides an API to poll on an individual logical instance:

• icp\_sal\_CyPollInstance - Poll a specific Cryptographic (CY) logical instance.



• icp\_sal\_DcPollInstance - Poll a specific Data Compression (DC) logical instance.

Refer to <u>Section 6.2.2, "Polling Functions</u>" for details on all the polling functions.

#### 2.2.1.2 Epolled Mode

The event-based poll mode is called "epoll mode". The Intel<sup>®</sup> QAT driver's new mode supports the Linux\* epoll interface. The Linux\* epoll is a scalable I/O event notification mechanism intended to replace the older select/poll system calls.

*NOTE:* For performance reasons, in epoll mode, only one instance (and one process) per bank should be used.

To use the Linux\* epoll, the user space application uses the following APIs:

- epoll\_create()/epoll\_create1() creates an epoll instance and returns a file
   descriptor referring to that instance.
- epoll ctl() registers the file descriptors which will be polled.
- epoll\_wait() waits for I/O events for the file descriptors registered via epoll\_ctl, blocking the calling thread if no events are currently available.

For more information, consult the Linux\* epoll manuals, here: <u>http://man7.org/linux/man-pages/man7/epoll.7.html</u>

**NOTE:** The Intel® QAT driver's epoll mode is only used by the user space instances, it is not valid for the kernel space.

The Intel® QAT driver's epoll mode consists of two parts: the kernel space part and the user space part.

The coalescing fields expose the same behavior for the epoll mode. If the interrupt is delayed by changing the Coalescing fields, the event delivery to user space will be delayed too.

To enable the epoll mode, ensure the following steps are followed:

1. In the configuration file, please use the "IsPolled = 2" for the user space instance, for example: Cy0Name = "SSL0"

CyOIsPolled = 2

2. Whether the application uses the driver in a synchronously or asynchronously, it should create a thread to call the Intel® QAT drivers epoll API and the Linux\* standard epoll interface.

The Intel<sup>®</sup> QAT drivers epoll API:

```
Crypto:icp_sal_CyGetFileDescriptor() / icp_sal_CyPutFileDescriptor()
Compression:icp_sal_DcGetFileDescriptor() /
icp_sal_DcPutFileDescriptor()
```

The Linux\* standard epoll interface:

```
epoll_create() / epoll_ctl() / epoll_wait()
```



### **NOTE:** For performance reasons, in epoll mode, only one instance (and one process) per bank should be used. The instance can be a crypto or compression instance.

#### For QAT 1.7 Generation Hardware:

When a bank is used for the epoll mode, it means there is only one instance (crypto or compression) for this bank. When the instance is used by a process, it means the process is the only user for this bank. Other processes could not use this bank temporarily. But if the process releases this instance, other processes can use this bank. Since there is only one instance for this bank, no more than 16 user space instances are available for 1.7 HW to configure all the banks for the epoll mode vs 128 user space instances for 1.8 HW. (For the Intel<sup>®</sup> Communications Chipset 8925-8955 series, up to 32 user space instances are available.)

#### For QAT 1.8 Generation Hardware:

If a process needs to provide compression and crypto services at the same time, it will need two instances, which means the process needs two banks. In such a scenario, no more than eight processes can be used for 1.7 HW vs 64 processes for 1.8 HW. (For the Intel<sup>®</sup> Communications Chipset 8925-8955 series, up to 16 processes can be used.)

For comparison purposes, when the CPU is in the idle state, for the user space instance, the standard poll mode ("IsPolled = 1") will poll the empty rings periodically and the polling will consume some CPU cycles (for instance, 2% usage may appear available when the CPU is in the idle state). But if epoll mode is used, the usage will stay at 0% when the CPU is in the idle state.

#### NOTE: The standard poll mode performs better when the CPU is in the high load state.

For user space instances, interrupt mode is no longer supported. Interrupt mode for the user space did not consume CPU cycles when there was no data in the response rings, unlike the polling mode, which continues to check at specified intervals. With the epoll support, standard Linux\* epoll APIs, such as epoll create()/epoll ctl()/epoll wait(), can be used.

Most web servers and socket-based applications, such as Nginx\*, Apache\*, etc., use one of epoll /select/poll to be notified when a socket is available for reading or writing, and then take appropriate action. With the epoll mode, the Intel<sup>®</sup> QuickAssist Technology driver will have more seamless integration into existing applications, such as Nginx\*, as it will be using a standard notification mechanism.



## 3 Acceleration Drivers Overview

Selected Intel<sup>®</sup> products support Intel<sup>®</sup> QAT. Depending on the product chosen, Intel<sup>®</sup> QAT accelerates both or either of two services: cryptography (both symmetric and public key) and data compression.

The Intel® QAT Endpoints are exposed as Peripheral Connect Interface (PCI) devices. Applications running in the user space typically access these services via the Intel® QAT APIs. Support for the applications that run only in the kernel space is planned for a future software release, but driver support for the Linux\* Kernel Cryptographic Framework (LKCF) API is present in this software release (default disabled).

### 3.1 Hardware/Software Overview

Because the hardware is accessed using the Intel® QAT APIs, it is not necessary to know all the hardware and software architecture details, but some knowledge of the underlying hardware and software is helpful for performance optimization and debugging purposes. For example, to support customers with different acceleration performance requirements, the Intel® C62x Chipset is available in different SKUs and supports two different "fabric configurations". Figure 2 and Figure 3 show two possible configurations for the acceleration endpoints in one Intel® C62x Chipset die.



#### Figure 2. Intel<sup>®</sup> C62x Chipset (PCH) Acceleration Endpoint Configuration 1





Figure 3. Intel® C62x Chipset (PCH) Acceleration Endpoint Configuration 2

For a given platform, the specific internal connections and number of Intel<sup>®</sup> QAT Endpoints per die (for instance, up to three for Intel<sup>®</sup> C62x Chipset) is product dependent, SKU-dependent, routing-dependent (i.e., how many lanes are routed), and configuration-dependent (e.g., with different fabric configuration soft-straps). For each Intel<sup>®</sup> QAT Endpoint (e.g., QAT [0]), hardware-assisted rings are used as the communication mechanism to transfer requests between the CPU and the Intel<sup>®</sup> QAT Endpoint(s) and vice-versa. The 1.7 HW supports 256 rings vs 1024 rings for 1.8 HW (per Intel<sup>®</sup> QAT Endpoint), each with head and tail Configuration Status Register (CSR) pointers that are mapped to PCIe\* memory on the CPU. Rings are assigned by the provided software based on the Cryptography (CY) and Data Compression (DC) instances declared in the configuration files. Refer to <u>Section 3.2</u>, <u>Acceleration Driver Configuration File</u> for more information.

Each Intel<sup>®</sup> QAT Endpoint has multiple computation engines. For a given Intel<sup>®</sup> QAT Endpoint, all rings associated with that endpoint are shared, and the hardware load balances requests from these rings.

A user can write directly to the Intel<sup>®</sup> QAT APIs, or the use of Intel<sup>®</sup> QAT can be done through frameworks that have been enabled by others including Intel<sup>®</sup> (for example, zlib<sup>\*</sup>, OpenSSL<sup>\*</sup> libcrypto<sup>\*</sup>, and the Linux<sup>\*</sup> Kernel Crypto Framework).

The driver architecture supports simultaneous operation of multiple applications.



### 3.2 Acceleration Driver Configuration File

An acceleration driver has a configuration file that is used to configure the driver for runtime operation. There is a single configuration file for each Intel<sup>®</sup> QAT Endpoint in the system. If Single-Root Input/Output Virtualization (SR-IOV) is enabled, a separate configuration file is used for each virtual function, if applicable. The configuration file format is described in <u>Section 4.1, Configuration File Overview.</u>

### 3.3 Utility for Loading Configuration Files and Sending Events to the Driver - adf\_ctl

The adf ctl user space utility is separate from the driver and provides a mechanism for:

- Loading configuration file data to the kernel driver. The kernel space driver uses the data and provides it to the user space driver.
- Sending events to the driver to bring devices up and down.

The adf\_ctl provided with the Intel® QAT 1.7 & 1.8 drivers can be used to interface with Intel® QAT v1.6, 1.7 and 1.8 devices.

#### 3.3.1 Usage

- To bring up, down, restart or reset device(s):

   /adf\_ctl [-c|--config] [config\_file\_path] [qat\_dev<N>]
   [up|down|restart|reset]
- To print device(s) status: ./adf\_ctl [qat\_dev<N>] status
- To use the specified configuration file: -c (--config) [config file path]

NOTE: If no device (physical or virtual) is selected, this file is used against all existing devices.

#### 3.3.2 Examples

- To bring device 0 down: ./adf\_ctl qat\_dev0 down
- To load device configuration from default path /etc/c4xxx\_dev1.conf, then bring device 1 up:
  - ./adf\_ctl qat\_dev1 up
- To load device configuration from specified path ~/user\_c4xxx\_dev1.conf, then bring device lup:
   ./adf ctl -c ~/user c4xxx dev1.conf qat dev1 up
- To restart all devices with default configuration file ~/user\_c4xxx\_dev1.conf: ./adf\_ctl restart



- To restart all devices with specified configuration file ~/user\_c4xxx\_dev1.conf: ./adf\_ctl -c ~/user\_c4xxx\_dev1.conf restart
- To restart device 0 with specified configuration file ~/user\_c4xxx\_dev1.conf: ./adf ctl -c ~/user c4xxx dev1.conf qat dev0 restart
- To restart device 0: ./adf\_ctl qat\_dev0 reset

### 3.4 Application Payload Memory Allocation

When performing offload operations through the Intel<sup>®</sup> QAT API, it is required that the payload data be placed in a buffer that is resident, physically contiguous, and Direct Memory Access (DMA) accessible from the acceleration hardware. It is the applications responsibility to provide buffers with these constraints.

Buffers are passed to the API with virtual addresses. The API translates these addresses to the address information required by the hardware (see the following table).

| Service                        | ΑΡΙ                        | Reference                                                                                                                             |
|--------------------------------|----------------------------|---------------------------------------------------------------------------------------------------------------------------------------|
| Cryptographic<br>service       | cpaCySetAddressTranslation | See the Intel <sup>®</sup> QuickAssist Technology<br>Cryptographic API Reference Manual (refer to<br><u>Table 2</u> ) for details.    |
| Data<br>Compression<br>service | cpaDcSetAddressTranslation | See the Intel <sup>®</sup> QuickAssist Technology Data<br>Compression API Reference Manual (refer to<br><u>Table 2</u> ) for details. |

#### Table 3. Services

When the software requires the physical address, it calls the registered function.

**NOTE:** This address translation function is called at least once per request. Consequently, for optimal performance, the implementation of this function should be optimized.

If using the Intel<sup>®</sup> QAT Data Plane API, buffers are passed to the Intel<sup>®</sup> QAT API as physical addresses. The library passes this directly to the hardware, without the need for translation.

All these tasks can be performed utilizing the User Space DMA-able Memory (USDM) driver supplied with the Intel® QAT driver package. The driver consists of the kernel-mode and user-mode parts allowing allocation of 1k-aligned memory blocks, setting up address translation, and automatic block deallocation in case of a user application crash."

#### 3.4.1 Thread Specific USDM

By default, memory allocation uses the USDM slab allocator, which gives 2MB contiguous memory. The allocation has locks in the library to prevent a race condition in getting the memory from the slab. This lock has an impact on some multi-threaded applications and use cases, like HAProxy, causing a drop in performance. To mitigate this issue, thread specific

USDM is implemented with the v4.21 release, which allocates and handles memory specific to threads. (For multi-thread apps, allocated memory information will be maintained separately for each thread). This feature can be enabled by configuring with the configure flag --enable-icp-thread-specific-usdm.

In some use cases with thread specific USDM, using a 128K slab allocator instead of the default 2MB allocator could improve performance and reduce memory consumption for a large number of threads. This can be enabled by configuring with the configure flag --enable-128k-slab.

**NOTE:** There is a limitation with thread specific USDM: memory allocated in one thread should be freed only by the thread which allocates it. Incorrect cleanup can lead to a segmentation fault (segfault). Also, memory allocated in a thread is freed automatically when the thread exits/terminates, even if the user does not explicitly free the memory.

#### See the Getting Started Guide for more information on ./configure flags.

We have observed poor multithreaded performance with QAT\_Engine using OpenSSL\* at higher thread counts. Unfortunately, these issues appear to stem from the way OpenSSL\* implements its engine\_table\_select and locks. For relevant issues on the OpenSSL\* github pages, see the two issues below:

- OpenSSL\* 1.1.1.x: Performance bottleneck with locks in engine\_table\_select() function #18509, https://github.com/openssl/openssl/issues/18509
- OpenSSL\* 3.0: 3.0 performance degraded due to locking #20286, https://github.com/openssl/openssl/issues/20286

### 3.5 User Space Additional Functions

To allow a user space access to the Intel<sup>®</sup> QAT rings, the service access layer must be configured to expose logical instances to the user space process. Logical instances are configured using the per device configuration files.

To allow each process to have separate logical instances, the configuration file groups a set of logical instances by name. The process then must call the  $icp_sal_userStart$  function (refer to Section 6.2.4.1) at initialization time with the name associated with the group of logical instances. Similarly, on process exit, to free the resources and make them available to other processes with the same name, the process must call the function  $icp_sal_userStop$  (refer to Section 6.2.2.1).

For example, the user can configure the driver to have two crypto logical instances available for the process called "SSL". The user space process may then access these logical instances by calling the cpaCyGetInstances function. The application may then initiate a session with these logical instances and perform a cryptographic operation. See the *Intel*<sup>®</sup> *QuickAssist Technology Cryptographic API Reference Manual. Refer to* <u>Table 2</u> of the manual for more information on the API functions available for use.

For this example, the logical instances section of the configuration file is as follows:

[SSL] NumberCyInstances = 2

```
NumberDcInstances = 0
NumProcesses = 1
LimitDevAccess = 0
# Crypto - User instance #0
Cy0Name = "SSL0"
Cy0IsPolled = 1
# List of core affinities
Cy0CoreAffinity = 1
# Crypto - User instance #1
Cy1Name = "SSL1"
Cy1IsPolled = 1
# List of core affinities
Cy1CoreAffinity = 2
```

In this example, the user process Secure Sockets Layer (SSL) configures two logical instances (called "SSL0" and "SSL1").

### 3.6 Managing Intel® QAT Endpoints Using qat\_service

The gat\_service script is installed with the software package in the /etc/init.d/ directory. The script allows a user to start, stop, shutdown or query the status (up or down) of a single Intel<sup>®</sup> QAT Endpoint or all Intel<sup>®</sup> QAT Endpoints in the system.

Usage:

```
# ./qat service start||stop||status||restart||shutdown
```

To view all Intel<sup>®</sup> QAT Endpoints in the system, use: # ./qat\_service status

If there are two Intel<sup>®</sup> QAT Endpoints in the system, the output will be as follows:

```
qat_dev0 - type: c6xx, inst_id: 0, bsf: 06:00:0, #accel: 5
#engines: 10 state: up
qat_dev1 - type: c6xx, inst_id: 1, bsf: 83:00:0, #accel: 5 #engines: 10
state: up
```

For a system with multiple Intel<sup>®</sup> QAT Endpoints, you can start, stop or restart each device by passing the Intel<sup>®</sup> QAT Endpoint to be restarted or stopped as a parameter ( $qat_dev<N>$ ). For example:

# ./qat\_service stop qat\_dev0 where the device number <N> is equal to 0 in this case.

The shutdown qualifier enables the user to bring down all Intel<sup>®</sup> QAT Endpoints and unload driver modules from the kernel. Compared with the stop qualifier, which brings down one or more Intel<sup>®</sup> QAT Endpoints, but does not unload kernel modules, so other Intel<sup>®</sup> QAT Endpoints can still run.

**NOTE:** In systems with more than three devices it might be necessary to change the qat\_service timeout in /etc/systemd/system/qat\_service.service.d/ startup-timeout.conf.

### 3.7 Overview of QAT *debugfs* entries

Some useful debugging information for the driver and configuration is available via the Linux\* debugfs file system, with the entries /sys/kernel/debug/qat\_\* and /sys/kernel/debug/qae\_mem\_dbg/qae\_mem\_slabs.

For more information, see <u>Chapter 8: Black Box Debug Tool</u> or the <u>Intel® QuickAssist</u> <u>Technology Debugging Guide</u>.

### 3.7.1 Entries in /sys/kernel/debug/qat\_\*

This includes:

#### Table 4. Intel® QuickAssist Technology /sys/kernel/debug Entries

| Entry                                     | Description                                                                                                                      |  |  |
|-------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------|--|--|
| cnv_errors                                | Indicates number of compress and Verify errors. Refer to <u>Section</u><br><u>6.1.5, Compress and Verify Error log in Sysfs:</u> |  |  |
| dev_cfg                                   | Displays internal device configuration information                                                                               |  |  |
| frequency                                 | Displays frequency of Acceleration Engines                                                                                       |  |  |
| fw_counters                               | Displays Acceleration Engine firmware requests/responses                                                                         |  |  |
| heartbeat heartbeat_failed heartbeat_sent | Refer to Section 3.17.3.3.1 System Virtual Files                                                                                 |  |  |
| transport                                 | Contains firmware request/response data. Available only for kernel space instances.                                              |  |  |
| version                                   | Includes package version information                                                                                             |  |  |

#### 3.7.2 Memory driver queries (*qae\_mem\_slabs*)

Debug features are also available by reading and writing the file /sys/kernel/debug/qae\_mem\_dbg/qae\_mem\_slabs. When *reading* the virtual/physical address, size and slab id together with the pid of the allocating process are shown. Writing a string to the file will start executing debug commands.

For example:

# cat /sys/kernel/debug/qae mem dbg/qae mem slabs

```
Pid 78854, Slab Id 10550771712
Virtual address 00000000b39412d, Physical Address 274e00000, Size 2097152
```



```
Pid 78854, Slab Id 10309599232
Virtual address 00000003670dd45, Physical Address 266800000, Size
2097152
• • •
```

There are three commands supported, and the table below shows their output. Writing these strings will give the output when the file is read.

#### Table 5.gae\_mem\_slabsCommandsSupported

| Command                                                                                                                    | Output                                               |  |  |
|----------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------|--|--|
| " <b>d</b> <pid> <virtual address="" or="" physical="">"</virtual></pid>                                                   | The 256 byte in hex and ascii from the start address |  |  |
| " <b>c</b> <pid> <slab id="">"<br/>(pid should be the process id that can be<br/>obtained by a previous read)</slab></pid> | The allocation bit map for the given slab identifier |  |  |
| " <b>t</b> "                                                                                                               | Total size of NUMA memory allocated in kernel space  |  |  |

For example, by combining a *write* to the file and a subsequent *read*, you can see the total allocated NUMA memory, e.g.:

```
# echo "t" > /sys/kernel/debug/qae_mem_dbg/qae_mem_slabs ; cat
/sys/kernel/debug/qae_mem_dbg/qae_mem_slabs
```

Total allocated NUMA memory: 0 bytes

As above, the "d" and "c" commands will output their respective information.

### 3.8 Compression Status Codes

The CpaDcRqResults structure should be checked for compression status codes in the CpaDcReqStatus data field. The mapping of the error codes to the enums is included in the quickassist/include/dc/cpa\_dc.h file.

#### 3.8.1 Intel® QAT Compression API Errors

The Intel<sup>®</sup> QAT Compression APIs that send requests to the compression hardware can return the error codes shown in the following table. These APIs are:

- cpaDcCompressData()
- cpaDcDecompressData()
- cpaDcDpEnqueueOp()
- cpaDcDpEnqueueOpBatch()
- **NOTE:** Decompression issues in the table below may also apply to the compression use case due to potential issues encountered during a Compress-and-Verify operation. In this case, the file(s) /sys/kernel/debug/qat\_\*/cnv\_errors may show these nested errors. In some cases, the suggested corrective action may need to be to store the block uncompressed or to compress the block with software.

| Error<br>Code | Error Type                      | Description                                                                                | Suggested Corrective<br>Action(s)                                                                                                                                                   |  |
|---------------|---------------------------------|--------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|
| 0             | CPA_DC_OK                       | No error detected by compression hardware.                                                 | None.                                                                                                                                                                               |  |
| -1            | CPA_DC_INVALID_BLOCK<br>_TYPE   | Invalid block type (type =<br>3); invalid input stream<br>detected for<br>decompression    | Decompression error.<br>Discard output. For a<br>stateless session, resubmit<br>affected request. For a<br>stateful session, abort the<br>session calling<br>CpaDcRemoveSession (). |  |
| -2            | CPA_DC_BAD_STORED_<br>BLOCK_LEN | Stored block length did<br>not match one's<br>complement; invalid input<br>stream detected | Decompression error.<br>Discard output. For a<br>stateless session, resubmit<br>affected request. For a<br>stateful session, abort the<br>session calling<br>CpaDcRemoveSession (). |  |
| -3            | CPA_DC_TOO_MANY<br>_CODES       | Too many length or<br>distance codes; invalid<br>input stream detected                     | Decompression error.<br>Discard output. For a<br>stateless session, resubmit<br>affected request. For a<br>stateful session, abort the<br>session calling<br>CpaDcRemoveSession (). |  |
| -4            | CPA_DC_INCOMPLETE<br>_CODE_LENS | Code length codes<br>incomplete: invalid input<br>stream detected                          | Decompression error.<br>Discard output. For a<br>stateless session, resubmit<br>affected request. For a<br>stateful session, abort the<br>session calling<br>CpaDcRemoveSession().  |  |
| -5            | CPA_DC_REPEATED_LENS            | Repeated lengths with no<br>first length; invalid input<br>stream detected                 | Decompression error.<br>Discard output. For a<br>stateless session, resubmit<br>affected request. For a<br>stateful session, abort the<br>session calling<br>CpaDcRemoveSession().  |  |
| -6            | CPA_DC_MORE_REPEAT              | Repeat more than<br>specified lengths; invalid<br>input stream detected                    | Decompression error.<br>Discard output. For a<br>stateless session, resubmit<br>affected request. For a<br>stateful session, abort the<br>session calling<br>CpaDcRemoveSession().  |  |

#### Table 6. Intel® QAT Compression API Errors

| Error<br>Code | Error Type                                                                                   | Description                                                                                                      | Suggested Corrective<br>Action(s)                                                                                                                                                  |  |
|---------------|----------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|
| -7            | CPA_DC_BAD_LITLEN<br>_CODES                                                                  | Invalid literal/length code<br>lengths; invalid input<br>stream detected                                         | Decompression error.<br>Discard output. For a<br>stateless session, resubmit<br>affected request. For a<br>stateful session, abort the<br>session calling<br>CpaDcRemoveSession(). |  |
| -8            | CPA_DC_BAD_DIST<br>_CODES Invalid distance code<br>lengths; invalid input<br>stream detected |                                                                                                                  | Decompression error.<br>Discard output. For a<br>stateless session, resubmit<br>affected request. For a<br>stateful session, abort the<br>session calling<br>CpaDcRemoveSession(). |  |
| -9            | CPA_DC_INVALID_CODE                                                                          | Invalid literal/length or<br>distance code in fixed or<br>dynamic block; invalid<br>input stream detected        | Decompression error.<br>Discard output. For a<br>stateless session, resubmit<br>affected request. For a<br>stateful session, abort the<br>session calling<br>CpaDcRemoveSession(). |  |
| -10           | CPA_DC_INVALID_DIST                                                                          | Distance is too far back in<br>fixed or dynamic block;<br>invalid input stream<br>detected                       | Decompression error.<br>Discard output. For a<br>stateless session, resubmit<br>affected request. For a<br>stateful session, abort the<br>session calling<br>CpaDcRemoveSession(). |  |
| -11           | CPA_DC_OVERFLOW                                                                              | Overflow detected. This is<br>not an error, but an<br>exception. Overflow is<br>supported and can be<br>handled. | Resubmit with a larger output<br>buffer when appropriate.<br><u>Table 22</u> in <u>Section 6.1.1</u><br>gives details on the various<br>overflow exceptions.                       |  |
| -12           | CPA_DC_SOFTERR                                                                               | Other non-fatal detected.                                                                                        | Discard output. For a<br>stateless session, resubmit<br>affected request. For a<br>stateful session, abort the<br>session calling<br>CpaDcRemoveSession().                         |  |
| -13           | CPA_DC_FATALERR                                                                              | Fatal error detected.                                                                                            | Discard output and abort the session calling CpaDcRemoveSession().                                                                                                                 |  |

| Error<br>Code | Error Type Description         |                                                                                                                                                                                                                                                                                                                            | Suggested Corrective<br>Action(s)                                               |
|---------------|--------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------|
| -14           | CPA_DC_MAX<br>_RESUBMITERR     | On an error being<br>detected, the firmware<br>attempted to correct and<br>resubmitted the request,<br>however, the maximum<br>resubmit value was<br>exceeded. Maximal value<br>is internally set in the<br>firmware to 10 attempts.<br>This is a QAT1.6 error only.<br>This error code is<br>considered as a fatal error. | Discard output and abort the<br>session calling<br>CpaDcRemoveSession().        |
| -15           | CPA_DC_INCOMPLETE<br>_FILE_ERR | This decompression error<br>can be reported only by<br>QAT 1.7 devices. However,<br>it is not exposed to the<br>application.<br>The input file is<br>incomplete. This indicates<br>that the request was<br>submitted with a<br>CPA_DC_FLUSH_FINAL.<br>However, a BFINAL bit<br>was not found in the<br>request.            | No corrective action is<br>required as it is not exposed<br>to the application. |
| -16           | CPA_DC_WDOG_TIMER _<br>ERR     | The request was not<br>completed as a watchdog<br>timer hardware event<br>occurred.                                                                                                                                                                                                                                        | Discard output and resubmit the affected request.                               |
| -17           | CPA_DC_EP_HARDWARE             | This is a recoverable error<br>available only with QAT1.7<br>devices. Request was not<br>completed as an end point<br>hardware error occurred<br>(for example, a parity<br>error).                                                                                                                                         | Discard output and abort the<br>session calling<br>CpaDcRemoveSession().        |
| -18           | CPA_DC_VERIFY_ERROR            | Compress and Verify<br>(CnV). This is a<br>compression direction<br>error only. During the<br>decompression of the<br>compressed payload, an<br>error was detected and<br>the deflate block<br>produced is invalid.                                                                                                        | Discard output; resubmit<br>affected request.                                   |
| -19           | CPA_DC_EMPTY_DYM_BLK           | Decompression request<br>contained an empty<br>dynamic stored block (not<br>supported).                                                                                                                                                                                                                                    | Discard output.                                                                 |
| -20           | CPA_DC_CRC_INTEG_ERR           | Compression CRC data<br>integrity check error<br>detected.                                                                                                                                                                                                                                                                 | Discard output: resubmit<br>affected request or abort the<br>session.           |



- **NOTE:** Except for the errors CPA\_DC\_OK, CPA\_DC\_OVERFLOW, CPA\_DC\_FATALERR, CPA\_DC\_MAX\_RESUBMITERR, CPA\_DC\_WDOG\_TIMER\_ERR, CPA\_DC\_VERIFY\_ERR, and CPA\_DC\_EP\_HARDWARE\_ERR, the rest of the error codes can be considered as invalid input stream errors.
- **NOTE:** When the suggested corrective action is to discard the output, it implies that the application must also ignore the consumed data, the produced data, and the checksum values.

### 3.9 Stateful Compression Unsupported

Stateful compression is no longer supported.

### 3.10 Stateless Compression Level Details

The throughput and compression ratio for stateless compression can be adjusted with the compression levels to achieve particular requirements. The most recent software packages now support four compression levels, and the history buffer size is ignored.

#### 3.10.1 Compression Level Mapping

#### 3.10.1.1.1 QAT 1.7 hardware:

Compression levels 1 to 4 translate to search depth 1, 4, 8, and 16, respectively.

Compression levels 5 to 9 are retained for backward compatibility, but map to level 4.

| Compression<br>Level<br>(at the QAT<br>API) | Search | HB <sup>1</sup> Size<br>(KB) | Stateful Context Size (KB) |                 |                 |       |
|---------------------------------------------|--------|------------------------------|----------------------------|-----------------|-----------------|-------|
|                                             | Depth  |                              | HB <sup>1</sup>            | HT <sup>2</sup> | LL <sup>3</sup> | Total |
| 1                                           | 1      | 32                           | 32                         | 16              | 0               | 48    |
| 2                                           | 4      | 16                           | 16                         | 16              | 32              | 64    |
| 3                                           | 8      | 16                           | 16                         | 16              | 32              | 64    |
| 4 through 9                                 | 16     | 16                           | 16                         | 16              | 32              | 64    |

#### Table 7. Compression Levels for QAT 1.7 Hardware

1. History Buffer. For a search depth of 1, this is 32KB and uses Banks A, B, C and D. For other search depths, this is 16KB and uses Banks A and B.

- 2. Hash Table. Regardless of search depth, this is 16KB and uses Banks F and G.
- 3. Linked List. For a search depth of 1, this is not used. For other search depths, this is 32KB and uses Banks C, D, H and I.



#### 3.10.1.1.2 QAT 1.8 hardware:

Compression levels 1 to 5 translate to search depth 1, 4, 8, 16, and 128, respectively.

Compression levels 6 to 9 are retained for backward compatibility but map to level 5.

| Table 8. | Compression | Levels for QAT | 1.8 Hardware |
|----------|-------------|----------------|--------------|
|----------|-------------|----------------|--------------|

| Compression<br>Level Search<br>(at the QAT Depth<br>API) | HB <sup>1</sup> Size | Stateful Context Size (KB) |                 |                 |                 |       |
|----------------------------------------------------------|----------------------|----------------------------|-----------------|-----------------|-----------------|-------|
|                                                          |                      |                            | HB <sup>1</sup> | HT <sup>2</sup> | LL <sup>3</sup> | Total |
| 1                                                        | 1                    | 32                         | 32              | 16              | 0               | 48    |
| 2                                                        | 4                    | 16                         | 16              | 16              | 32              | 64    |
| 3                                                        | 8                    | 16                         | 16              | 16              | 32              | 64    |
| 4                                                        | 16                   | 16                         | 16              | 16              | 32              | 64    |
| 5 through 9                                              | 128                  | 16                         | 16              | 16              | 32              | 64    |

4. History Buffer. For a search depth of 1, this is 32KB and uses Banks A, B, C and D. For other search depths, this is 16KB and uses Banks A and B.

- 5. Hash Table. Regardless of search depth, this is 16KB and uses Banks F and G.
- 6. Linked List. For a search depth of 1, this is not used. For other search depths, this is 32KB and uses Banks C, D, H and I.

#### 3.10.2 Limitation on History Buffer Size (aka Deflate Window Size)

**NOTE:** These details are specific to QAT 1.x hardware.

NOTE: The history buffer size is also known as the deflate window size.

There are rare use cases where compressible files may have *worse* compression at higher compression levels, and this section explains those rare cases.

The issue is related to the history buffer size used during the compression process.

In level 2 (L2) through level 9 (L9) compression levels, the history buffer size is limited to 16KB. This buffer/window this is used to store previously processed data, and find matches within the data to achieve better compression. For files *smaller* than 16KB, higher compression levels will usually achieve better compression. However, for larger files, with a 16KB history buffer, the compression algorithm will not be able to find matches at lengths greater than 16KB. This limitation affects the compression ratio at L2 and above.

For example, there are specific files with most compressible matches at distances >= the 16KB buffer size, all of which cannot be accessed at L2 or above. The result is that for these unique files, compression level L1 provides the best compression results. While this is generally uncommon, it could be common to specific datasets.



<u>Table 7</u> and <u>Table 8</u> (above) help understand the relationship between compression levels, history buffer size, and compression performance. These shows the history window, search depth, and context size for different compression levels.

### 3.11 Acceleration Driver Return Codes

The following table shows the return codes used by various components of the acceleration driver, defined in quickassist/include/cpa.h.

| Table 9. | Acceleration Driver Return Codes |
|----------|----------------------------------|
|----------|----------------------------------|

| Return Type              | Return Code | Description                                                                                                                                                                                 |
|--------------------------|-------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| CPA_STATUS_SUCCESS       | 0           | Requested operation was successful.                                                                                                                                                         |
| CPA_STATUS_FAIL          | -1          | A general or unspecified error occurred. Refer<br>to the console log user space application or to<br>/var/log/messages in kernel space for more<br>details of the failure.                  |
| CPA_STATUS_RETRY         | -2          | Recoverable errors occurred. Refer to relevant sections of the API for specifics on what the suggested course of action.                                                                    |
| CPA_STATUS_RESOURCE      | -3          | Required resource is unavailable. The resource<br>that has been requested is unavailable. Refer to<br>relevant sections of the API for specifics on<br>what the suggested course of action. |
| CPA_STATUS_INVALID_PARAM | -4          | Invalid parameter has been passed in.                                                                                                                                                       |
| CPA_STATUS_FATAL         | -5          | A fatal error has occurred. A serious error has<br>occurred. The recommended course of action<br>is to shut down and restart the component.                                                 |
| CPA_STATUS_UNSUPPORTED   | -6          | The function is not supported, at least not with<br>the specific parameters supplied. This may be<br>because the current implementation does not<br>support a particular capability.        |
| CPA_STATUS_RESTARTING    | -7          | The API implementation is restarting.<br>Restarting may be reported if, for example, a<br>hardware implementation is undergoing a<br>reset.                                                 |

The following table shows the return codes used by the acceleration driver to handle the Linux\* device driver operations.

| Table 10. | Acceleration Driver Return Codes for Linux* Device Driver Operations |
|-----------|----------------------------------------------------------------------|
|-----------|----------------------------------------------------------------------|

| Return Type | Return Code | Description                   |
|-------------|-------------|-------------------------------|
| SUCCESS     | 0           | The operation was successful. |

| Return Type | Return Code | Description                                                                                                                                         |
|-------------|-------------|-----------------------------------------------------------------------------------------------------------------------------------------------------|
| FAIL        | 1           | A general error occurred. Refer to the console log user space application or to /var/log/ messages in kernel space for more details of the failure. |
| -EPERM      | -1          | Operation is not permitted. Used during ioctl operations.                                                                                           |
| -EIO        | -5          | Input/Output error occurred. Used when copying configuration data to and from user space.                                                           |
| -EBADF      | -9          | Bad File Number. Used when an invalid file descriptor is detected.                                                                                  |
| -EAGAIN     | -11         | Try Again. Used when a recoverable operation occurred.                                                                                              |
| -ENOMEM     | -12         | Out of Memory. A memory resource that has been requested is not available.                                                                          |
| -EACCES     | -13         | Permission Denied. Used when the operation failed to connect to a process or open a device.                                                         |
| -EFAULT     | -14         | Bad Address. Used when an operation detects invalid parameter data.                                                                                 |
| -ENODEV     | -19         | No Such Device. Used when an operation detects invalid device id.                                                                                   |
| -ENOTTY     | -25         | Invalid Command Type. Used when an ioctl operation detects an invalid command type.                                                                 |

### 3.12 Batch and Pack Compression Unsupported

Batch and Pack (BnP) compression are no longer supported.

### 3.13 Compress and Verify Feature

The Compress and Verify (CnV) feature check and ensures data integrity in the compression operation of the Data Compression API. This feature introduces an independent capability to verify the compression transformation.

Refer to Intel® QuickAssist Technology Data Compression API Reference Manual.

#### NOTE:

- 1. CnV is always enabled via the cpaDcCompressData () API.
- 2. CnV supports compression operations only.
- 3. The compressAndVerify flag in the CpaDcDpOpData structure should be set to CPA\_TRUE when using the cpaDcDpEnqueueOp() or cpaDcDpEnqueueOpBatch() API. These APIs are declared in the API file cpa\_dc\_dp.h.



 The compressAndVerify flag in the CpaDcOpData structure should be set to CPA\_TRUE when using the cpaDcCompressData2 () API. This API is declared in the API file cpa\_dc.h.

The CnV functionality is implemented in the Data Compression APIs

cpaDcCompressData(), cpaDcCompressData2(), cpaDcDpEnqueueOp() and cpaDcDpEnqueueOpBatch() for the compression path only.

These APIs are declared and documented in the API file cpa\_dc.h.

**NOTE:** It is possible to recover from Compress and Verify errors in a seamless manner. Refer to the Compress and Verify and Recover discussion in <u>Section 6.1.3</u>.

# 3.14 Running Applications as Non-Root User

The installation of Intel<sup>®</sup> QAT software package configures the driver to allow applications to run as Non-Root User. The users must be added to the 'qat' group.

When the make install is performed at the directory where the Intel<sup>®</sup> QAT package is installed, the following udev file is created that is responsible for setting up non-root access. KERNEL=="gat adf ctl" MODE="0660" GROUP="gat" RUN+="/bin/chgrp gat

```
/usr/local/bin/adf ctl"
KERNEL=="gat dev processes" MODE="0660" GROUP="gat"
KERNEL=="usdm drv" MODE="0660" GROUP="gat"
ACTION=="add", DEVPATH=="/module/usdm drv" SUBSYSTEM=="module"
RUN+="/bin/mkdir / dev/hugepages/gat"
ACTION=="add", DEVPATH=="/module/usdm drv" SUBSYSTEM=="module"
RUN+="/bin/chgrp gat /dev/hugepages/gat"
ACTION=="add", DEVPATH=="/module/usdm drv" SUBSYSTEM=="module"
RUN+="/bin/chmod 0770 /dev/hugepages/qat"
ACTION=="remove", DEVPATH=="/module/usdm drv" SUBSYSTEM=="module"
RUN+="/bin/rmdir
/dev/hugepages/qat"
KERNEL=="uio*", ATTRS{vendor}=="0x8086", ATTRS{device}=="0x0435"
MODE="0660" GROUP="gat"
KERNEL=="uio*", ATTRS{vendor}=="0x8086", ATTRS{device}=="0x0443"
MODE="0660" GROUP="gat"
KERNEL=="uio*", ATTRS{vendor}=="0x8086", ATTRS{device}=="0x37c8"
MODE="0660" GROUP="gat"
KERNEL=="uio*", ATTRS{vendor}=="0x8086", ATTRS{device}=="0x37c9"
MODE="0660" GROUP="gat"
KERNEL=="uio*", ATTRS{vendor}=="0x8086", ATTRS{device}=="0x6f54"
MODE="0660" GROUP="gat"
KERNEL=="uio*", ATTRS{vendor}=="0x8086", ATTRS{device}=="0x6f55"
MODE="0660" GROUP="gat"
KERNEL=="uio*", ATTRS{vendor}=="0x8086", ATTRS{device}=="0x19e2"
MODE="0660" GROUP="gat"
KERNEL=="uio*", ATTRS{vendor}=="0x8086", ATTRS{device}=="0x19e3"
MODE="0660" GROUP="gat"
```

The updates to the udev rules are performed during the installation of the Intel® QAT driver.

The following steps need to be manually applied:

Change the amount of max locked memory for the username that is included in the group name (which is by default 64). This can be done by specifying the limit in /etc/security/limits.conf. @gat - memlock 4096

# 3.15 Random Number Generation

Starting with Intel<sup>®</sup> QAT Hardware version 1.7, Intel<sup>®</sup> QAT no longer includes random number generation capability, because this capability is already included in the CPU and is available via the RDRAND and RDSEED instructions.

# 3.16 Huge Pages with the Included Memory Driver

The included User space DMAable Memory driver ( $usdm_drv.ko$ ) supports 2 MB pages. This allows direct access to main memory by devices other than the CPU and the actual supported maximum memory size in one individual allocation when huge pages is enabled is 2 MB – 5 KB. Where the 5 KB is used for memory management for the memory driver. The use of 2 MB pages provides benefits, but also requires additional configuration. Use of this capability assumes that enough huge pages are allocated in the operating system for the particular use case and configuration.

Here are some examples use cases:
# insmod ./usdm drv.ko

Default settings applied. # insmod ./usdm drv.ko max mem numa=32768

Maximum amount of Non-uniform Memory Access (NUMA) type memory that the User Space DMA-able Memory (USDM) driver can allocate is 32 MB in total for all processes. Huge pages are disabled.

# insmod ./usdm\_drv.ko max\_huge\_pages=50 max\_huge\_pages\_per\_process=5

Maximum number of huge pages that the USDM can allocate is 50 in total and 5 per process (up to 10 processes, 0 for the next processes).

# insmod ./usdm\_drv.ko max\_huge\_pages=3
max\_huge\_pages\_per\_process=5

An erroneous configuration, maximum number of huge pages that USDM can allocate is 3 totals: 3 for a first process, 0 for the next processes. # insmod ./usdm\_drv.ko max\_huge\_pages\_per\_process=5

An invalid configuration, huge pages are disabled because max\_huge\_pages is 0 by default. # insmod ./usdm drv.ko max huge pages=5

An invalid configuration, huge pages are disabled because max\_huge\_pages\_per\_process is
0 by default.

*NOTE:* The use of huge pages may not be supported for all use cases. For instance, depending on the driver version, some limitations may exist for an Input/Output Memory Management Unit (IOMMU).



# 3.17 Heartbeat

Under some circumstances, firmware in the Intel<sup>®</sup> QAT devices could become unresponsive, requiring a device reset to recover. The Intel<sup>®</sup> QAT Heartbeat feature provides a mechanism for the customer application to detect and reset unresponsive devices. It also notifies the application processes of the start and end of the reset operation and suspends all Intel<sup>®</sup> QAT instances between the events.

### 3.17.1 Heartbeat Operation

A Heartbeat enabled Intel<sup>®</sup> QAT device firmware periodically writes counters to a specified physical memory location. A pair of counters per thread is incremented at the start and end of the main processing loop within the firmware. Checking for Heartbeat consists of checking the validity of the pair of counter values for each thread. Stagnant counters indicate a firmware hang.

#### 3.17.1.1 Initialization

At startup, the Intel<sup>®</sup> QAT device driver allocates memory for the counter pairs to be written by the firmware and then sends a message to the firmware to start the heartbeat functionality.

#### 3.17.1.2 Heartbeat Monitoring

Heartbeat check/monitoring refers to the invocation of one of the two API calls that check if the device is responsive. Heartbeat failure refers to the API returning failure.

The Intel<sup>®</sup> QAT driver does not monitor for Heartbeat. It should be initiated by a Heartbeat management thread calling one of the following APIs periodically:

- icp sal check device(Cpa32U accelId);
- icp sal check all devices(void);

A failure return code implies the device has failed or hung.

The Heartbeat management thread should satisfy the following conditions:

- For any given device, only one such process/thread should monitor.
- One process can monitor one or more devices.
- Can be a user application that uses Intel<sup>®</sup> QAT services, or a separate management/control plane process.
- In virtualized environment, monitoring process(es)/thread(s) must run in the context of the host or hypervisor.

#### 3.17.1.3 Resetting a Failed Device

A device can be configured for automatic reset by the Intel<sup>®</sup> QAT framework or manually reset by the application by using the AutoResetOnError field in the device configuration file /etc/<device>.conf, as shown in the following table.

#### Table 11. AutoResetOnError Values

| AutoResetOnError Value | Action on Heartbeat Failure    |
|------------------------|--------------------------------|
| 0 (default)            | Do not reset the device        |
| 1                      | Reset the device automatically |

If an Intel<sup>®</sup> QAT device is not configured for automatic reset, the management thread should reset it using the icp\_sal\_reset\_device (Cpa32U\_accelId) API.

The icp\_sal\_reset\_device () function starts an asynchronous reset sequence and returns immediately. The reset function should not be called again until the device has completed the reset to avoid a reset storm. The icp\_sal\_check\_device (<device id>) function could be called in a loop to check if the device reset is still in progress.

If the application devices are all configured for automatic reset, then the icp\_sal\_check\_all\_devices() function could be used; otherwise, the function should not be used because it does not return the identity of the failed device, which is a required parameter for the icp\_sal\_reset\_device() function.

#### 3.17.1.3.1 Function Signatures

The details of the above functions, parameters, and return values can be found in <u>Section 6.2,</u> <u>Additional APIs</u>.

#### 3.17.2 Incorporating Heartbeat into Intel® QAT Applications

A typical Intel® QAT user application consists of two tasks:

- The first task is typically an application thread that initializes Intel<sup>®</sup> QAT instances and sessions, and then submits service requests for Intel<sup>®</sup> QAT crypto or compression.
- If an application employs polling to receive Intel<sup>®</sup> QAT service responses, then this task is also an application thread. Alternatively, responses are received as an interrupt handler.

Two more tasks are required to support Heartbeat:

- The first is a management task to monitor the devices for failure or hang and then resets them, when required. As discussed earlier, this could be an application thread of an independent management process.
- The second task is an application thread that polls for device reset events:
- CPA INSTANCE EVENT RESTARTING (device is restarting)

- CPA\_INSTANCE\_EVENT\_RESTARTED (device restart is complete)

If the application employs polling to receive Intel<sup>®</sup> QAT service responses, then this task could be included in the same polling loop.

The polling for device events is done using the API:



• icp\_sal\_poll\_device\_events()

The two callback functions for crypto and compression are registered using the following APIs:

- cpaCyInstanceSetNotificationCb
- cpaDcInstanceSetNotificationCb

The details of the above functions, parameters, and return values can be found in <u>Section 6.2,</u> <u>Additional APIs</u>

#### 3.17.2.1.1 Restart Sequence

During the restart sequence, the user space library releases the memory used for rings and other data structures as part of the shutdown and reallocates them when the restart is completed. The process is transparent to the user application, so it can continue to use the same logical instance after reset to submit Intel<sup>®</sup> QAT service requests. Any memory allocated by the user application for the Intel<sup>®</sup> QAT service is untouched during device reset.

A typical Heartbeat error use-case is as follows:

- 1. The driver and the firmware are loaded, initialized, and started.
- 2. The user-space application registers to receive instance notifications by calling cpaCyInstanceSetNotificationCb and cpaDcInstanceSetNotificationCb.
- The management thread monitors for the device's Heartbeat. When a device is unresponsive, a device reset is initiated by the management thread or by the Intel<sup>®</sup> QAT framework depending on the device configuration.
- 4. The kernel-space process sends the Restarting event to the user-space process.
- 5. The user-space driver passes the device restarting event to all the registered application instances. It also frees memory and rings associated with the registered instances.
- 6. The kernel-space driver triggers the device reset.
- 7. During reset, the Intel<sup>®</sup> QAT service requests made by the user application returns one of:
  - O CPA\_STATUS\_FAIL
  - O CPA\_STATUS\_RETRY
  - CPA\_STATUS\_RESTARTING
- 8. When the device reset is complete, the kernel-space driver sends a device Restarted event to the user space driver.
- 9. The user space driver allocates the memory and rings and then forwards the device Restarted event to each of the registered instances.

#### 3.17.2.1.2 Status of Packets in Flight (Crypto Applications Only)

When a device has fatal errors, the application ordinarily cannot determine whether or not inflight requests have been processed successfully.

The current Intel<sup>®</sup> QAT release includes a dummy response feature that creates mock responses to all requests submitted during a fatal error condition, so the application can detect them and, therefore, know which requests need to be resubmitted to the available devices or to the software.

**NOTE:** The sequence of dummy responses will match the sending request sequence for all requests submitted during a fatal error.

Since the dummy response feature only supports Public Key Encryption (PKE), dummy responses may be generated only when the icp\_sal\_CyPollInstance() function is called, since it is the function for crypto services.

The icp\_sal\_poll\_device\_events () function should also be called by the application, so that the application get a notification when the device encounters a failure and dummy responses are generated when calling icp\_sal\_CyPollInstance() for the inflight requests.

#### 3.17.2.1.3 Determining Device ID

The <device id> that is passed as a parameter to several Heartbeat API is the numeric suffix of the device name displayed by the following command. (Device name: qat dev0)

```
#service qat_service status
There is 1 QAT acceleration device(s) in the system:
qat_dev0 - type: c3xxx, inst_id: 0, node_id: 0, bsf: 01:00.0, #accel:
3 #engines: 6 state: up
```

The Intel® QAT library has no API to discover the device number easily. However, an application can use the IOCTLs IOCTL\_GET\_NUM\_DEVICES and IOCTL\_STATUS\_ACCEL\_DEV to find the device\_id of a particular device if they know the Bus Device Function (BDF). Refer to perform\_query\_dev() in ./adf\_ctl.cpp.

#### 3.17.2.1.4 Setting Polling Minimal Period

QAT driver has possibility to set Heartbeat poll period value inside conf file as HeartbeatTimer parameter (see Table 16):

- HeartbeatTimer minimal acceptable value is 100 [ms], due to limitation on firmware
- If value is not set in config file, default heartbeat pool period value is equal 500 [ms]

Reading Heartbeat value (e.g.: `cat

/sys/kernel/debug/qat\_c4xxx\_0000\:f4\:00.0/heartbeat`) more frequent than once per Heartbeat poll period time, causes return value equal -1 and Kernel log: "HB poll frequency is higher than configured HB timer".

#### 3.17.3 Testing Heartbeat

Two debug capabilities are available to assist the developers incorporating Heartbeat into their applications:

- Simulation of Heartbeat failure
- System virtual files under /sys/kernel/debug/



#### 3.17.3.1 Simulated Heartbeat Failure Configuration

The Heartbeat feature is always enabled in the package. However, a debug capability that simulates device failure can be enabled during the configure step as follows: # ./configure --enable-icp-hb-fail-sim

#### 3.17.3.2 Simulating Heartbeat Failure

Simulating Heartbeat failure can be accomplished using two methods:

- Using API icp\_sal\_heartbeat\_simulate\_failure(<deviceid>)
- Executing the command:
- # cat /sys/kernel/debug/<device>/heartbeat sim fail

#### 3.17.3.3 System Virtual Files

**NOTE:** The Heartbeat /sys/kernel/debug files are associated with the QAT Physical Function (PF).

The Heartbeat feature implements the following system virtual files under the /sys/ kernel/debug/qat\_cxxx\_<your\_device\_BDF>/ directory.

#### Table 12. Heartbeat System Virtual Files

| File             | Content                                                                  |
|------------------|--------------------------------------------------------------------------|
| heartbeat        | 0: Device is responsive.<br>-1: Device is NOT responsive.                |
| heartbeat_failed | Number of times the device became unresponsive.                          |
| heartbeat_sent   | Number of times the control process checked if the device is responsive. |

A developer could simulate the Heartbeat management process by running the following script in the background:

```
#!/bin/bash while : do
```

cat /sys/kernel/debug/<device>/heartbeat > /dev/null sleep 1

done

#### 3.17.3.4 Heartbeat Polling Frequencies

The application developer should decide on the following two Heartbeat polling frequencies:

- Device Heartbeat monitoring
- Checking for device reset events

#### 3.17.3.4.1 Device Heartbeat Monitoring

Consider the following points when determining the frequency of Heartbeat monitoring:



- Increasing Heartbeat monitoring frequency minimize the customer's system downtime
- However, since device unresponsiveness should be an infrequent event, high frequency Heartbeat monitoring wastes CPU cycles.
- Also, if there are large Intel<sup>®</sup> QAT service requests that take some time to complete, high frequency Heartbeat monitoring could result in false reports of unresponsiveness.

#### 3.17.3.4.2 Checking for Device Reset Events

If the application uses polling for reading Intel® QAT service responses, there is no value in checking for resets more frequently. Since device unresponsiveness is an infrequent occurrence, frequency of checking for reset events could be a fraction of the frequency of polling for Intel® QAT service responses.

# 3.18 Handling Device Failures in a Virtualized Environment

The Heartbeat feature in the acceleration software can be used in a virtualized environment. Refer to the Using Intel® Virtualization Technology (Intel® VT) with Intel® QuickAssist Technology Application Note (refer to Table 2) for more details on enabling SR-IOV and the creation of Virtual Functions (VFs) from a single Intel® QuickAssist Technology acceleration device to support acceleration for multiple Virtual Machines (VMs).

The following sequence describes a possible use case for using the Heartbeat feature in a virtualized environment:

- 5. The Intel® QAT Physical Function driver (PF driver) is loaded, initialized and started.
- 6. The Intel<sup>®</sup> QAT Virtual Function driver (VF driver) is loaded, initialized and started in the Guest OS in the VM.
- *NOTE:* For Intel® Communications Chipset 8900 to 8920 Series Software (aka Cave Creek) -- The PF driver detects that the firmware is unresponsive (using either of the following methods: User Proc Entry Read (not Enabled by Default) or User Application Heartbeat APIs (not Enabled by Default).
- 7. The PF driver sends the "Restarting" event message to the VF via the internal PF to VF communication messaging mechanism.
- 8. The VF driver sends the "Restarting" event to the application's registered callback. The callback is registered using either of the Intel<sup>®</sup> QAT API functions cpaDcInstanceSetNotificationCb() or cpaCyInstanceSetNotificationCb() in the Guest OS. The application's callback function may perform any application-level cleanup.
- 9. The PF driver starts the reset sequence (save state, initiate reset, and restore state).
- 10. The user restarts the Guest OS and loads the VF driver and application in the Guest OS.
- *NOTE:* If the Heartbeat feature in the acceleration software is not enabled, the PF driver will not notify the VF driver that the firmware is unresponsive.



**NOTE:** The error detection mechanisms are not available on the VF driver in the VM, but device errors caused by any of the software running on the VM will be detected by the PF driver using the above mechanisms.

### 3.18.1 Understanding System Messages and Warnings

During the operation of Intel<sup>®</sup> QAT hardware, the system may log various messages that help diagnose configuration and performance issues. One such message is:

[ 17.730925] QAT: Could not find a device on node 1

This message is informational only, and indicates that a kernel application is attempting to use a QAT device on a specific node, but no QAT device is directly attached to that node. As a result, the application may experience reduced performance due to using a QAT device on a remote node.

This message is not indicative of an error, but rather a potential performance consideration. It is most commonly seen during the early stages of driver loading and the crypto self-test. The message is rate-limited and, as of kernel version 6.3, is logged as a debug message to avoid excessive entries in the system logs.

If you do not observe this message on your platform, it may be due to one of the following reasons:

- 1. The platform does not have remote nodes, i.e., it is a single-socket system.
- 2. Kernel tests are running on a core that has a local QAT accelerator attached, avoiding the need for remote node access.
- 3. The kernel configuration suppresses the printing of informational and debug messages.

For optimal performance, it is recommended to run applications on cores that have local access to QAT devices. Please refer to the system topology and QAT device distribution to ensure proper application and QAT device affinity.

# 3.19 Incorporating Dummy Responses into an Intel<sup>®</sup> QAT Application

The dummy response feature has been incorporated in a scenario with the Intel<sup>®</sup> QAT engine and Nginx<sup>\*</sup>. Figure 4 below illustrates how it works. This can be used as a reference to so-called "software fallback."

The Intel<sup>®</sup> QAT engine is a shim layer between OpenSSL\* libcrypto\* and Intel<sup>®</sup> QAT Library. The Intel<sup>®</sup> QAT Library will generate failover responses.

The Heartbeat Monitoring Daemon, a single process, is a daemon which is used to check the device status periodically and trigger the driver the reset the device when Heartbeat failure happens. Its only activity is calling icp\_sal\_check\_device() or icp\_sal\_check\_all\_devices() periodically.

The Intel<sup>®</sup> QAT Engine polls for and handles "device error" and "device ok" events (via udev). It keeps track of the number of devices which are active.



- If some, but not all, Intel<sup>®</sup> QAT devices encounter errors, switch to remaining available devices by resubmitting the inflight requests, which are responded to with dummy responses and new requests to the available devices.
- If the number of active Intel<sup>®</sup> QAT devices goes to zero, switch to software and resubmit the inflight requests which are responded to with dummy responses and new requests to the software.
- If the number of active Intel® QAT devices goes positive again, switch back to hardware.





### 3.19.1 Reliability, Availability, Serviceability

The Reliability, Availability, Serviceability (RAS) features are designed to limit the impact of errors within QAT. This section describes the software element required to support the QAT RAS capabilities. As background, the RAS terms are summarized as follows:

• **Reliability**: Refers to how often errors occur in a system and whether the system can recover from an error condition.



- Availability: Refers to how flexible the system resources can be allocated or redistributed for the system utilization and system recovery from errors.
- Serviceability: Refers to how well the system reports and handles events related to errors.

# 3.19.2 End to End Data Integrity Support in QAT 1.8:

*NOTE:* This End-to-End Data Integrity Support is not available in QAT Hardware Generation 1.7 and earlier devices.

In QuickAssist Hardware Generation 1.8, additional CRCs have been added to the compression to provide end-to-end data integrity support for performing payload verification throughout the compression pipeline. The CRC for both the input and output data are generated. The Compress and Verify feature supported in previous generations of QAT SW forms part of the overall data integrity feature. The Compress and Verify feature is used to verify that the compressed output from a compression job can be successfully decompressed. The additional CRCs in QAT 1.8 adds protection for the data as it is transferred between Dynamic Random-Access Memory (DRAM) and the QAT and as it flows through the compression processing pipeline.

# 3.20 Rate Limiting

Rate Limiting is implemented by monitoring the utilization of the device on a per-VF, perservice basis and comparing that to the SLA allocated to that VF and service. Resources are shared across guests and the resource utilization of each guest is measured relative to the capacity of the physical function.

The feature is supported only in rate limiting firmware for cryptographic or compression services.

To enable the Rate Limiting feature:

- 1. Install the driver package on the host with Single-Root Input/Output Virtualization (SR-IOV) enabled.
- 2. Update the physical function configure file depending on your device type set either the ServicesProfile parameter to a value that supports rate limiting (e.g., CRYPTO, CUSTOM1, COMPRESSION) or set RateLimitingEnabled parameter as 1 to enable the rate limiting.
- 3. Set ServicesEnabled to cy or sym or asym or dc.
- 4. Perform qat\_service shutdown and qat\_service start.

This procedure also enables Device Utilization measurement (refer to <u>Section 3.21)</u> Rate limiting requires a virtualized environment, but device utilization can be used without virtualization.

RateLimitingEnabled flag is used only for c4xxx driver. For drivers: 200xx, c3xxx, c6xx, d15xx, dh895xcc Rate Limiting feature is enabled by ServicesProfile parameter and proper image selection (CRYPTO, COMPRESSION or CUSTOM1),

When a ServicesProfiles parameter value is used that supports rate limiting is defined, internal resources are reallocated to administrating Rate Limiting/Device Utilization. This reduces performance for symmetric crypto and data compression by roughly 10%.

# 3.20.1 Service Level Agreement (SLA)

Service Level Agreement enforcement allocates a specified amount of capacity for a specified service to a specified VF.

Max SLA enforced = (number of VFs) X (number of services) where:

- Number of VFs varies based on device type
- Number of services = 3 (asymmetric or symmetric or compression)

NOTE: The number of VFs supporting rate limiting is 32 due to firmware limitation.

### 3.20.2 SLA Units

SLA units are measured as follows:

- Symmetric Crypto/Compression 1Mbps of reference operation
- Asymmetric Crypto 1 operation (ops) of reference operation

NOTE: Enforced SLAs are rounded up to the next multiple of 1000 units.

#### 3.20.3 SLA Manager Application

The sla mgr tool is used to create, update, delete, list, and get SLA capabilities.

The SLA Manager executable is available in *SICP\_ROOT/build/sla\_mgr* after the package is built and installed using./configure; make install commands.

#### 3.20.3.1 Rate Limiting Commands

Create SLA: ./sla\_mgr create <vf\_addr> <rate\_in\_sla\_units> <service>
Update SLA: ./sla\_mgr update <pf\_addr> <sla\_id> <rate\_in\_sla\_units>
Delete SLA: ./sla\_mgr delete <pf\_addr> <sla\_id>
Delete all SLAs: ./sla\_mgr delete\_all <pf\_addr>
Query SLA capabilities:



• Query list of SLAs: ./sla\_mgr list <pf\_addr>

#### Options:

- pf addr Physical address in bus:device.function(xx:xx.x) format
- vf\_addr Virtual address in bus:device.function(xx:xx.x) format
- Service Asym(=0) or Sym(=1) or Dc(=2)
- rate in sla units [O-MAX]. MAX is found by querying the capabilities.

1 rate\_in\_sla\_units is equal to:

- loperation per second for asymmetric service
- 1 Megabits per second for symmetric service/compression service
- sla id Value returned by create command

# 3.21 DU Manager Application

Device Utilization (DU) is a way to measure utilization of acceleration hardware that corresponds to the throughput of cryptographic or compression services on a given physical or virtual function. This can vary between different device types and generations.

The  $du_mgr$  tool is used to measure the utilization of cryptographic or compression service for a given physical or virtual function.

The DU execution tool is available in <code>SICP\_ROOT/build/du\_mgr</code> after the package is built and installed using ./configure; make install commands.

To enable the Device Utilization feature:

- 1. Install the driver package on the host with SR-IOV enabled.
- 2. Update the physical function configure file to set ServicesProfile parameter to a value that supports rate limiting (e.g., CRYPTO, CUSTOM1, or COMPRESSION).
- 3. Set ServicesEnabled to cy or sym or asym or dc.
- 4. Perform qat\_service shutdown and qat\_service start.
- **NOTE:** When a ServicesProfiles parameter value is used that supports rate limiting is defined, internal resources are reallocated to administrating Rate Limiting/Device Utilization. This reduces performance for symmetric crypto or data compression by roughly 10%
- *NOTE:* The maximum SLA that can be set for a device is the maximum DU for that device. For various reasons, the acceptable margin of error for device utilization is 15%; therefore, the tool may report percentages over 100% (allowable range is 85-110%). This margin of error is much greater if durations over 5-seconds are used, as mentioned below.

# 3.21.1 Commands to Fetch Device Utilization

Start or Stop the device measurement: ./du\_mgr ( start / stop ) <pf\_addr>

Query utilization for Physical function: ./du mgr query <pf addr> <service>

Query utilization for Virtual function: ./du\_mgr query\_vf <pf\_addr> <vf\_addr> <service>

#### Options:

- pf\_addr Physical address in bus:device.function(xx:xx.x) format
- vf addr Virtual address in bus:device.function(xx:xx.x) format
- service Asym(=0) or Sym(=1) or Dc(=2)

### 3.21.2 Durations

Duration between start and stop commands should be between 5 to 10 seconds.

Duration of more than 10 seconds may give inconsistent query results.

Device utilization  $queryand query_vf$  reports utilization between the last start and stop command.

For a given physical or virtual function, the device utilization reported would be in relation to the maximum device capacity.

### 3.21.3 Reference Algorithm

The Symmetric Crypto Algorithm for Intel<sup>®</sup> QAT 1.7 devices is AES128-CBC HMACSHA1 with Packet size 1024 bytes.

The Symmetric Crypto Algorithm for Intel<sup>®</sup> QAT 1.6 devices is AES128-CBC HMACSHA2-256.

The Asymmetric Crypto Algorithm for both systems are RSA with 2048 modulus size.

# 3.22 Cipher-CRC

Cipher-CRC is a feature that enables offloading of cryptographic processing along with CRC operations to QAT 1.8 device.

This feature is supported **only** by DPDK (Data Plane Development Kit) API and cannot be used with QAT 1.8 package solely. It is supported only in Cipher-CRC firmware for cryptographic (cy) service (other services might be used, but cryptographic service is required). Cipher-CRC cannot be used in combination with Rate Limiting feature.



To enable the Cipher-CRC feature:

- 1. Install DPDK software and QAT 1.8 driver package according to DPDK instructions
- 2. Update the physical function configuration file by adding CipherCRCEnabled parameter in [GENERAL] section and set it to 1 to enable Cipher-CRC.
- 3. Set ServicesEnabled to cy. Other services (inline or dc) might also be set, but cy is required for Cipher-CRC feature.
- 4. Perform qat\_service stop and qat\_service start.
- 5. Follow DPDK instructions to use Cipher-CRC with DPDK API.
- **NOTE:** In case cryptographic service is not enabled in ServicesEnabled the Cipher-CRC is disabled regardless of CipherCRCEnabled.
- **NOTE:** Cipher-CRC feature cannot be used in combination with Rate Limiting. If both CipherCRCEnabled and RateLimitingEnabled parameters are set to 1 the device will not start unless Cipher-CRC is disabled due to incorrect platform type or ServicesEnabled.
- **NOTE:** CipherCRCEnabled flag is used **only** for c4xxx driver and **only** with DPDK API.

# 3.23 Access to Legacy Algorithms

By default, legacy algorithms are now disabled. To enable those algorithms, use the compilation flag *--enable-legacy-algorithms* (Getting Started Guide), which enables all legacy algorithms. Also see associated functions in our Cryptographic API Reference manual: *cpaCyQueryCapabilities()*, *CpaCySymCapabilitiesInfo()*, *cpaCySymQueryCapabilities()*, etc.

The following are the legacy algorithms now disabled by default.

Cipher Algorithms:

- ARC4
- AES-ECB
- AES-F8
- DES-ECB
- DES-CBC
- 3DES-ECB
- 3DES-CBC
- 3DES-CTR
- SM4-ECB

Hash Algorithms:

- MD5
- SHA1
- SHA224
- SHA3\_224

PKE Algorithms:

• RSA with key lengths less than 2048 bits

- DSA
- DH
- ECC with curve length less than 256 bits

### Table 13. Supported Legacy Algorithms

| Cipher Algorithm | QAT 1.6 | QAT 1.7x | QAT 1.8 | QAT 1.9 |
|------------------|---------|----------|---------|---------|
| NULL             | Y       | Y        | Y       | Y       |
| ARC4             | Opt-in  | Opt-in   | Opt-in  | Opt-in  |
| AES_ECB          | Opt-in  | Opt-in   | Opt-in  | Opt-in  |
| AES_CBC          | Y       | Y        | Y       | Y       |
| AES_CTR          | Y       | Y        | Y       | Y       |
| AES_CCM          | Y       | Y        | Y       | Y       |
| AES_GCM          | Y       | Y        | Y       | Y       |
| AES_F8           | Opt-in  | Opt-in   | Opt-in  | Opt-in  |
| AES_XTS          | Y       | Y        | Y       | Y       |
| DES_ECB          | Opt-in  | Opt-in   | Opt-in  | Opt-in  |
| DES_CBC          | Opt-in  | Opt-in   | Opt-in  | Opt-in  |
| 3DES_ECB         | Opt-in  | Opt-in   | Opt-in  | Opt-in  |
| 3DES_CBC         | Opt-in  | Opt-in   | Opt-in  | Opt-in  |
| 3DES_CTR         | Opt-in  | Opt-in   | Opt-in  | Opt-in  |
| KASUMI_F8        | Y       | Y        | Y       | Y       |
| SNOW3G-UEA2      | Y       | Y        | Y       | Y       |
| ZUC_EEA3         | Y       | Y        | Y       | Y       |
| СНАСНА           |         |          | Y       | Y       |
| SM4_ECB          |         |          | Opt-in  | Opt-in  |
| SM4_CBC          |         |          | Y       | Y       |
| SM4_CTR          |         |          | Y       | Y       |

| Hash Algorithm | QAT 1.6 | QAT 1.7x | QAT 1.8 | QAT 1.9 |
|----------------|---------|----------|---------|---------|
| MD5            | Opt-in  | Opt-in   | Opt-in  | Opt-in  |
| SHA1           | Opt-in  | Opt-in   | Opt-in  | Opt-in  |
| SHA224         | Opt-in  | Opt-in   | Opt-in  | Opt-in  |
| SHA256         | Y       | Y        | Y       | Y       |
| SHA384         | Y       | Y        | Y       | Y       |
| SHA512         | Y       | Y        | Y       | Y       |
| SHA3_224       |         |          | Opt-in  | Opt-in  |



| SHA3_256    |   | Y | Y | Y |
|-------------|---|---|---|---|
| SHA3_384    |   |   | Y | Y |
| SHA3_512    |   |   | Y | Y |
| AES_XCBC    | Y | Y | Y | Y |
| AES_CBC_MAC | Y | Y | Y | Y |
| AES_CCM     | Y | Y | Y | Y |
| AES_GCM     | Y | Y | Y | Y |
| AES_GMAC    | Y | Y | Y | Y |
| AES_CMAC    | Y | Y | Y | Y |
| KASUMI_F9   | Y | Y | Y |   |
| SNOW3G_UIA2 | Y | Y | Y | Y |
| ZUC_EIA3    | Y | Y | Y | Y |
| SHAKE_128   |   |   |   |   |
| SHAKE_256   |   |   |   |   |
| POLY        |   |   | Y | Y |
| SM3         |   |   | Y | Y |

| PKE                    | QAT 1.6 | QAT 1.7x | QAT 1.8 | QAT 1.9 |
|------------------------|---------|----------|---------|---------|
| RSA-512                | Opt-in  | Opt-in   | Opt-in  | Opt-in  |
| RSA-1024               | Opt-in  | Opt-in   | Opt-in  | Opt-in  |
| RSA-1536               | Opt-in  | Opt-in   | Opt-in  | Opt-in  |
| RSA-2048               | Y       | Y        | Y       | Y       |
| RSA-3072               | Y       | Y        | Y       | Y       |
| RSA-4096               | Y       | Y        | Y       | Y       |
| RSA-8192               |         |          |         |         |
| DH                     | Opt-in  | Opt-in   | Opt-in  | Opt-in  |
| DSA                    | Opt-in  | Opt-in   | Opt-in  | Opt-in  |
| SM2                    |         |          |         | Y       |
| ECC key < 256-bit      | Opt-in  | Opt-in   | Opt-in  | Opt-in  |
| ECDH Point<br>Multiply |         | Y        | Y       | Y       |
| ECDSA Sign             |         | Y        | Y       | Y       |
| ECDSA Verify           |         | Y        | Y       | Y       |
| x25519                 |         |          | Y       | Y       |
| x448                   |         |          | Y       | Y       |



"Opt-in" means that the algorithm is supported by SW/FW, but is not enabled with the default build configuration. Customers must use the opt-in build flag *--enable-legacy-algorithms* when building the SW library/driver to enable support for these legacy algorithms.

§



# *4 Acceleration Driver Configuration File*

This chapter describes the configuration file(s) that allows the customization of runtime operation. The configuration file(s) must be tuned to meet the performance needs of the target application.

**NOTE:** The software package includes a default configuration file, which may not provide optimal performance on all platforms. Consider performance implications as well as the configuration details provided in this chapter if your system requires modifications to the default configuration file.

# 4.1 Configuration File Overview

There is a single configuration file for each Intel<sup>®</sup> QAT Endpoint (and there may be multiple Intel<sup>®</sup> QAT Endpoints for a given hardware).

NOTE: Depending on the model number, a device may also contain no Intel® QAT Endpoints.

The configuration file is split into a number of different sections: a general section and one or more Logical Instance sections.

The **General** section includes parameters that allow the user to specify:

- Which services are enabled?
- Concurrent request default configuration.
- Interrupt coalescing configuration (optional).
- Statistics gathering configuration.

Additional details are included in Section 4.2, General Section.

**NOTE:** The concurrent request parameters include both Transmit (Tx) and Receive (Rx) requests.

**Logical Instances** sections (there may be one or more) include parameters that allow the user to set:

- The number of cryptography or data compression instances being managed.
- For each instance, the name of the instance, whether polling is enabled, and the core to which an instance is affinitized.

Additional details are included in Section 4.3, Logical Instances Section.

A sample configuration file is included in the package in the <code>quickassist/utilities/adf\_ctl/conf\_files</code> directory.

Available Sample Configuration per SKUs:

Sample configurations are broadly divided into services and SKUs that Intel® QAT 1.8 platforms can support. Intel® QAT 1.8 offers these services:

- Cryptography (cy)
- Symmetric cryptography (sym)
- Asymmetric cryptography (asym)
- Compression (dc)

# 4.2 General Section

The General section of the configuration file contains general parameters and statistics parameters.

### 4.2.1 General Parameters

The ServicesProfile parameter (see <u>Table ll</u>) defines the services that are available when the driver loads. For example, if "ServicesProfile = COMPRESSION" is in the GENERAL section, the compression and decompression are available, along with service chaining, but not cryptography.

- **NOTE:** The ServicesProfile parameter is used for all drivers excluding c4xxx, which uses RateLimitingEnabled.
- **NOTE:** When a ServicesProfile parameter value is used that supports rate limiting is defined, internal resources are reallocated to administrating Rate Limiting/Device Utilization. This reduces performance by roughly 5%.

| Service                   | DEFAULT | CRYPTO | COMPRESSION | CUSTOMI |
|---------------------------|---------|--------|-------------|---------|
| Asymmetric Crypto         | YES     | YES    |             | YES     |
| Symmetric Crypto          | YES     | YES    |             | YES     |
| Hash                      | YES     | YES    | YES         | YES     |
| Cipher                    | YES     | YES    |             | YES     |
| MGF KeyGen                | YES     | YES    |             |         |
| SSL/TLS KeyGen            | YES     | YES    |             | YES     |
| HKDF                      |         | YES    |             | YES     |
| Compression               | YES     |        | YES         | YES     |
| Decompression (stateless) | YES     |        | YES         | YES     |
| Decompression (stateful)  | YES     |        | YES         |         |



| Service            | DEFAULT | CRYPTO | COMPRESSION | CUSTOMI |
|--------------------|---------|--------|-------------|---------|
| Service Chaining   |         |        | YES         |         |
| Device Utilization |         | YES    | YES         | YES     |
| Rate Limiting      |         | YES    | YES         | YES     |

**NOTE:** Set the ServicesProfile to determine available features **excluding** c4xxx, which uses RateLimitingEnabled.

The following table describes the other parameters that can be included in the General section.

| Parameter                        | Description                                                                                                                                                                                                                  | Default | Range                                                                                                                                                                                                                                   |
|----------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| ServicesEnabled                  | Defines the service(s)<br>available<br>(cryptographic [cy],<br>data compression<br>[dc]), symmetric<br>cryptography only<br>[sym], asymmetric<br>cryptography only<br>[asym] <i>Note</i> : Mutually<br>exclusive with [cy]). | cy;dc   | cy, dc, sym and asym<br><i>Note:</i> Multiple values<br>permitted, use ";" as<br>the delimiter.<br>For exceptions, see<br><u>Section 4.3.3.2,</u><br><u>"Increasing the</u><br><u>Maximum Number of</u><br><u>Processes/Instances".</u> |
| CyNumConcurrentSymRequests       | Specifies the number<br>of cryptographic<br>concurrent symmetric<br>requests for<br>cryptographic<br>instances in general.                                                                                                   | 512     | 64, 128, 256, 512,<br>1024, 2048, 4096,<br>8192, 16384, 32768, or<br>65536                                                                                                                                                              |
| CyNumConcurrentAsymRequests      | Specifies the number<br>of cryptographic<br>concurrent<br>asymmetric requests<br>for cryptographic<br>instances in general.                                                                                                  | 64      | 64, 128, 256, 512,<br>1024, 2048, 4096,<br>8192, 16384, 32768, or<br>65536                                                                                                                                                              |
| DcNumConcurrentRequests          | Specifies the number<br>of data compression<br>concurrent requests<br>for data compression<br>instances in general.                                                                                                          | 512     | 64, 128, 256, 512,<br>1024, 2048, 4096,<br>8192, 16384, 32768, or<br>65536                                                                                                                                                              |
| DcIntermediateBufferSizeIn<br>KB | Specifies the size in<br>KB of each<br>intermediate buffer in<br>on-chip memory for<br>dynamic compression.                                                                                                                  | 64      | 32 or 64                                                                                                                                                                                                                                |

#### Table 15. General Parameters

| Parameter           | Description                                                                           | Default                                  | Range                                                               |
|---------------------|---------------------------------------------------------------------------------------|------------------------------------------|---------------------------------------------------------------------|
| AutoResetOnError    | Automatically resets<br>the device in case of<br>fatal error or Heartbeat<br>failure. | 0                                        | 0 or 1                                                              |
| NumInlineAccelUnits | Define AU number for the inline service                                               | 0                                        | <i>Note</i> : Inline feature is only supported on specific packages |
| NumCyAccelUnits     | Define AU number for<br>the crypto service                                            | 4,2, or 1<br>(depend s<br>on the<br>SKU) | 0 to 6                                                              |
| NumDcAccelUnits     | Define AU number for<br>the data compression<br>service                               | 2 or 1<br>(depend s<br>on the<br>SKU)    | 0 to 6                                                              |
| RateLimitingEnabled | This flag is to enable<br>Rate Limiting                                               | 0                                        | 0 or 1                                                              |
| HeartbeatTimer      | This value set minimal<br>Heartbeat polling<br>period time                            | 500                                      | >=100                                                               |

- **NOTE:** Not all parameters listed are available on all device types. RateLimitingEnabled parameter is used **only** for c4xxx driver and visible **only** inside c4xxx\_.conf.<services>.<SKU> configuration files.
- *NOTE:* "Default" denotes the value in the configuration file when shipped or the value used if not specified in the configuration file.

For all the services enabled, NumConcurrentRequests must be set in the configuration file to one of the following values: 64, 128, 256, 512, 1024, 2048, 4096, 8192, 16384, 32768 and 65536.

The number of concurrent requests registered by the Intel<sup>®</sup> QAT driver is set to NumConcurrentRequests -2.

This implementation ensures that the request ring will never be full and avoids the need for a Memory Mapped IO (MMIO) read. This implementation maximizes throughput performance.

# 4.3 Logical Instances Section

This section allows the configuration of logical instances in each address domain (kernel space and individual user space processes).

The address domains are in the following formaxt:



- For the kernel address domain: [KERNEL] targeted to Linux\* Kernel Crypto Framework (LKCF)
- For the Intel<sup>®</sup> QAT API in Kernel address domain [KERNEL QAT]
- For user process address domains: [xxxxx], where xxxxx may be any ASCII value that uniquely identifies the user mode process.

In user space, to allow the driver to configure the logical instances associated with a user process correctly, the process must call the function  $icp_sal\_userStart$  passing the xxxxx string during process initialization. When the user space process is finished, it must call the function  $icp\_sal\_userStop$  to free resources. Refer to Section 6.2.4, User Space Access Configuration Functions for more information.

A single Virtual Function (VF) configured for the SR-IOV use case cannot have both user space instances and kernel space instances. Separate VFs must be created for user space and kernel space.

The NumProcesses parameter (in the User Process section) indicates the max number of user space processes within that section name with access to instances on this device. Refer to <u>Section 6.2.4.2, icp\_sal\_userStop</u> for more information.

The items that can be configured for a logical instance are:

- The name of the logical instance
- The polling mode
- The core to which the instance is affinitized (optional)

# 4.3.1 [KERNEL] Section

In the [KERNEL] section of the configuration file, information about the number and type of kernel instances supporting Linux\* Kernel Crypto Framework can be defined.

This section is different from the [KERNEL\_QAT] section. The [KERNEL] section in the configuration file defines instances to register the Intel<sup>®</sup> QuickAssist Acceleration with Linux\* Kernel Crypto Framework (LKCF) while the instances defined in the [KERNEL\_QAT] section are exclusively targeted to be used with the Intel<sup>®</sup> QuickAssist API.

LKCF can be used with all devices supported within this software package.

The following table describes the parameters that determine the number of kernel instances for each service.

- **NOTE:** The maximum number of cryptographic instances supported per Intel® QAT Endpoint is 32; for exceptions, refer to <u>Section 4.3.3.2</u>, <u>Increasing the Maximum Number of Processes/Instances</u>.</u>
- **NOTE:** The NumberDcInstances is ignored in this section and is set to 0.



| Table 16. | [KERNEL] | Section Parameters |
|-----------|----------|--------------------|
|-----------|----------|--------------------|

| Parameter         | Description                                                                                                                    | Default | Range   |
|-------------------|--------------------------------------------------------------------------------------------------------------------------------|---------|---------|
| NumberCyInstances | Specifies the number of<br>cryptographic instances.<br><i>Note:</i> Depends on the number of<br>allocations to other services. | 0       | 0 to 32 |

#### 4.3.1.1 Enabling Linux\* Kernel Crypto Framework (LKCF)

To enable Linux\* Kernel Crypto Framework, or LKCF: during the ./configure step, add the flag -enable-qat-lkcf. Also enable at least one Cy instance in [KERNEL] section of the configuration file. Dc instances are not used in [KERNEL] section.

After installation, to confirm which QAT algorithms were registered with LKCF, run cat /proc/crypto, and look for algorithms with their module set to intel qat.

# 4.3.2 [KERNEL\_QAT] Section

The [KERNEL\_QAT] section defines instances that can be used by the Intel<sup>®</sup> QuickAssist API in Kernel space domain.

The table below describes the parameters for the [KERNEL\_QAT] section.

- *NOTE:* Intel<sup>®</sup> QuickAssist API is not supported by Intel<sup>®</sup> QAT 1.8 devices.
- **NOTE:** The maximum number of cryptographic and data compression instances supported is 32 per Intel® QAT Endpoint; for exceptions, refer to <u>Section 4.3.3.2</u>, <u>Increasing the</u> <u>Maximum Number of Processes/Instances</u>.

#### Table 17. [KERNEL\_QAT] Section Parameters

| Parameter         | Description                                                                                                                       | Default | Range   |
|-------------------|-----------------------------------------------------------------------------------------------------------------------------------|---------|---------|
| NumberCyInstances | Specifies the number of<br>cryptographic instances.<br><i>Note:</i> Depends on the number of<br>allocations to other services.    | 6       | 0 to 32 |
| NumberDcInstances | Specifies the number of data<br>compression instances.<br><i>Note:</i> Depends on the number of<br>allocations to other services. | 2       | 0 to 32 |

1. NumberCyInstances depends on the number of allocations to other two services

2. "Default" denotes the value in the configuration file when shipped.



# 4.3.3 User Process [xxxxx] Sections

There is one [xxxxx] section of the configuration file for each Intel<sup>®</sup> QAT Endpoint to be configured.

**NOTE:** Check the SKU information for your specific device to determine how many Intel<sup>®</sup> QAT Endpoints the device contains. There can be up to three Intel<sup>®</sup> QAT Endpoints per device.

In each [xxxxx] section of the configuration file, user space access to the Intel<sup>®</sup> QAT Endpoint can be configured.

The table below shows the parameters in the configuration file that can be set for user process [xxxxx] sections.

Parameters for each user process instance can also be defined. The parameters that can be included for each specific user process instance are like those in <u>Section 4.3, Logical Instances</u> <u>Section</u>.

| Parameter      | Description                                                                                                                                                                                                                                                                                                                                                                                                                         | Default | Range                                                                                                                                                                                                                                                                                                                                                     |
|----------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| NumProcesses   | The number of user space<br>processes with section name<br>[xxxxx] that have access to this<br>device.<br>The maximum number of<br>processes that can call<br>icp_sal_userStart and be active at<br>any one time. Refer to <u>Section</u><br><u>6.2.4.1, "icp_sal_userStart"</u> for more<br>information.<br><b>Caution:</b> Resources are pre-<br>allocated. If this parameter value is<br>set too high, the driver fails to load. | 1       | For constraints, see<br>Section 4.3.3.1 Maximum<br>Number of Process<br>Calculations.<br>For exceptions, see<br>Section 4.3.3.2, Increasing<br>the Maximum Number of<br>Processes/Instances.                                                                                                                                                              |
| LimitDevAccess | Indicates if the user space<br>processes in this section are limited<br>to only access instances on this<br>Intel® QAT Endpoint.                                                                                                                                                                                                                                                                                                    | 0       | 0 (disabled, processes in this<br>section can access multiple<br>Intel® QAT Endpoints) or 1<br>(enabled, processes in this<br>section can only access this<br>Intel® QAT Endpoint). For<br>additional information, see<br><u>Section 4.5 Configuring</u><br><u>Multiple Processes on a</u><br><u>System with Multiple Intel®</u><br><u>QAT Endpoints.</u> |

#### Table 18. [KERNEL\_QAT] Section Parameters



| Parameter         | Description                                                                                                                       | Default | Range                                                                                                        |
|-------------------|-----------------------------------------------------------------------------------------------------------------------------------|---------|--------------------------------------------------------------------------------------------------------------|
| NumberCyInstances | Specifies the number of<br>cryptographic instances.<br><b>Note:</b> Depends on the number of<br>allocations to other services.    | 6       | 0 to 32. For exceptions, see<br>Section 4.3.3.2, Increasing<br>the Maximum Number of<br>Processes/Instances. |
| NumberDcInstances | Specifies the number of data<br>compression instances.<br><i>Note:</i> Depends on the number of<br>allocations to other services. | 2       | 0 to 32                                                                                                      |

#### 4.3.3.1 Maximum Number of Process Calculations

The NumProcesses parameter is the number of user space processes per service within the [xxxx] section domain with access to this Intel<sup>®</sup> QAT Endpoint.

The value to which this parameter can be set is determined by a number of factors, most significantly, the number of cryptography instances and/or data compression instances in the process section. The total number of processes, per service, created by the driver is given by the expression (e.g., for cryptography):

#### (NumProcesses) x (NumberCyInstances)

In Intel<sup>®</sup> QAT 1.7 devices, there are 16 ring banks per Intel<sup>®</sup> QAT Endpoint and a maximum of two cryptography instances and two compression instances per bank. The maximum number of instances per device is 32 for cryptography and 32 for compression. For exceptions, refer to <u>Section 4.3.3.2</u>, Increasing the Maximum Number of Processes/Instances.

The following code example illustrates the maximum number of possible processes per device in polling mode:

```
NumProcesses = 32
NumCyInstances = 1
NumDcInstances = 1
```

#### 4.3.3.2 Increasing the Maximum Number of Processes/Instances

#### NOTE:

- 1. One bank is used per Intel® QAT virtual function (VFs).
- 2. This section only applies when the instances make use of polled mode.

It is possible to increase the number of processes supported by the software. In Intel® QAT 1.7 devices, there are 16 ring banks per Intel® QAT Endpoint where Intel® QAT 1.8 devices have 128 and a maximum of two cryptography instances and two compression instances per bank (or per VF) when the configuration file has ServicesEnabled equal to cy; dc. However, the maximum number of instances can be increased with the careful selection of the ServiceEnabled parameter.



Compression, symmetric cryptography, and asymmetric cryptography each require two rings out of the 16 possible rings for Intel<sup>®</sup> QAT 1.7 devices verses 8 for Intel<sup>®</sup> QAT 1.8 devices for a ring bank. By selecting only, the services needed, the number of instances can be increased.

# *NOTE:* Not all versions of the Intel<sup>®</sup> QAT software package support the ability to increase the number of processes.

Here are the variations:

- With ServicesEnabled equal to sym, only two rings are used for each instance, so for Intel® QAT 1.7 devices, eight instances can be used per bank (or per VF), or 128 instances per Intel® QAT Endpoint. For Intel® QAT 1.8 devices, four instances can be used per bank (or per VF), or 512 instances per Intel® QAT Endpoint. In this case, compression and asymmetric crypto services will not be available.
- With ServicesEnabled equal to asym, only two rings are used for each instance, so for Intel® QAT 1.7 devices, eight instances can be used per bank (or per VF), or 128 instances per Intel® QAT Endpoint. For Intel® QAT 1.8 devices, four instances can be used per bank (or per VF), or 512 instances per Intel® QAT Endpoint. In this case, compression and symmetric crypto services will not be available.
- With ServicesEnabled equal to cy, only four rings are used for each instance (two each for asymmetric and symmetric crypto), so for Intel® QAT 1.7 devices, four instances can be used per bank (or per VF), or 64 instances per Intel® QAT Endpoint. For Intel® QAT 1.8 devices, two instances can be used per bank (or per VF), or 256 instances per Intel® QAT Endpoint. In this case, compression services will not be available.
- With ServicesEnabled equal to dc, only two rings are used for each instance, so for Intel® QAT 1.7 devices, eight instances can be used per bank (or per VF), or 128 instances per Intel® QAT Endpoint. For Intel® QAT 1.8 devices, four instances can be used per bank (or per VF), or 512 instances per Intel® QAT Endpoint. In this case, asymmetric and symmetric crypto services will not be available.
- With ServicesEnabled equal to dc; asym, only four rings are used for each instance (two each for compression and asymmetric crypto), so for Intel® QAT 1.7 devices, four instances can be used per bank (or per VF), or 64 instances per Intel® QAT Endpoint. For Intel® QAT 1.8 devices, two instances can be used per bank (or per VF), or 256 instances per Intel® QAT Endpoint. In this case, symmetric crypto services will not be available.
- With ServicesEnabled equal to dc; sym, only four rings are used for each instance (two each for compression and symmetric crypto), so for Intel® QAT 1.7 devices, four instances can be used per bank (or per VF), or 64 instances per Intel® QAT Endpoint. For Intel® QAT 1.8 devices, two instances can be used per bank (or per VF), or 256 instances per Intel® QAT Endpoint. In this case, asymmetric crypto services will not be available.

NOTE: The ServicesProfile parameter value may also need to be changed. See Section 4.2.1.

#### 4.3.3.3 Configuring Instances for Virtual Functions

To configure the number of instances for a virtual function:

1. Install the driver package on the host with SR-IOV enabled.

- 2. Update the physical function configuration file to set ServicesEnabled (refer to Section 4.3.3.2, Increasing the Maximum Number of Processes/Instances.)
- 3. Perform qat\_service shutdown and qat\_service start.
- 4. Update the virtual function configuration file to set ServicesEnabled (refer to Section 4.3.3.2, Increasing the Maximum Number of Processes/Instances.)
- 5. Restart qat\_service.

The value of ServicesEnabled in the VF configuration file should be the same as the value of ServicesEnabled in the PF configuration file, or a subset of that value as shown in <u>Table 15</u>. For instance, if a PF is configured as cy, allowable VF configurations related to that PF can only be cy, asym, or sym. VF device restart will fail if a VF configuration is not allowed for that related PF.

If a VF service is configured to a subset of PF service, the number of VF instances is limited to the number allowed for that PF service as described in <u>Section 4.3.3.2</u>, <u>Increasing the</u> <u>Maximum Number of Processes/Instances</u>. For example, if the PF configuration file has <u>ServicesEnabled=dc; asym</u>, only four (not eight) dc instances are enabled if the VF is configured for dc only.

| Configured PF Service | Available VF Services |  |
|-----------------------|-----------------------|--|
|                       | cy;dc                 |  |
|                       | су                    |  |
|                       | dc                    |  |
| cy;dc                 | sym                   |  |
|                       | asym                  |  |
|                       | dc;sym                |  |
|                       | dc;asym               |  |
|                       | су                    |  |
| су                    | sym                   |  |
|                       | asym                  |  |
|                       | dc;asym               |  |
| dc;asym               | asym                  |  |
|                       | dc                    |  |
|                       | dc;sym                |  |
| dc;sym                | sym                   |  |
|                       | dc                    |  |

#### Table 19. Configuring Physical Functions and Virtual Functions



| Configured PF Service | Available VF Services |
|-----------------------|-----------------------|
| asym                  | asym                  |
| sym                   | sym                   |
| dc                    | dc                    |

### 4.3.4 Cryptographic Logical Instance Parameters

The following table shows the parameters that can be set for cryptographic logical instances.

| Parameter                       | Description                                                                                                      | Default                                                                | Range                                                                                                                                                                                                                                              |
|---------------------------------|------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| CyXName                         | Specifies the name<br>of cryptographic<br>instance number X.                                                     | IPSec0 for KERNEL and<br>KERNEL_QAT sections.<br>SSL0 for user section | String (max. 64 characters)                                                                                                                                                                                                                        |
| CyXIsPolled                     | Specifies if<br>cryptographic<br>instance number x<br>works in poll mode,<br>interrupt mode or<br>epoll mode.    | 0 for kernel space<br>instances I for user<br>space instance           | 0 (interrupt mode) for<br>instances in the KERNEL<br>and KERNEL_QAT<br>sections 1 (poll mode) for<br>instances in the<br>KERNEL_QAT and user<br>space sections 2 (epoll<br>mode eventbased polling<br>mode) for instances in user<br>space section |
| CyXCoreAffinity                 | Specifies the core to<br>which the instance<br>should be affinitized.                                            | Varies depending on the value of X.                                    | 0 to max. number of cores<br>in the system                                                                                                                                                                                                         |
| CyNumConcurrent<br>SymRequests  | Specifies the number<br>of cryptographic<br>concurrent<br>symmetric requests<br>for cryptographic<br>instance X. | 512                                                                    | 64, 128, 256, 512, 1024,<br>2048, or 4096                                                                                                                                                                                                          |
| CyNumConcurrent<br>AsymRequests | Specifies the number<br>of concurrent<br>asymmetric requests<br>for cryptographic<br>instance X.                 | 64                                                                     | 64, 128, 256, 512, 1024,<br>2048, or 4096                                                                                                                                                                                                          |

#### Table 20. Cryptographic Logical Instance Parameters

**NOTE:** "Default" denotes the value in the configuration file when shipped.

#### 4.3.4.1 LKCF-supported algorithms:

See <u>Supported Algorithms in LKCF</u> for a full list.



# 4.3.5 Data Compression Logical Instance Parameters

The following table shows the parameters in the configuration file that can be set for data compression logical instances.

**NOTE:** The maximum number of data compression instances supported is 64.

| Parameter                    | Description                                                                                                   | Default                                                     | Range                                                                                                                                                                                                                                           |
|------------------------------|---------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| DcXName                      | Specifies the name of data<br>compression instance<br>number X.                                               | IPComp0                                                     | String (max. 64 characters)                                                                                                                                                                                                                     |
| DcXIsPolled                  | Specifies if data<br>compression instance<br>number x works in poll<br>mode, interrupt mode or<br>epoll mode. | 0 - kernel space<br>instances<br>1- user-space<br>instances | 0 (interrupt mode) for<br>instances in the KERNEL and<br>KERNEL_QAT sections<br>1 (poll mode) for instances in<br>the KERNEL_QAT and user<br>space sections<br>2 (epoll mode eventbased<br>polling mode) for instances in<br>user space section |
| DcXCoreAffinity              | Specifies the core to which<br>the data compression<br>instance should be<br>affinitized.                     | Varies<br>depending on<br>the value of X.                   | 0 to max. number of cores in<br>the system                                                                                                                                                                                                      |
| DcXNumConcurren<br>tRequests | The parameter specifies<br>the number of concurrent<br>data requests for<br>compression instance X.           | 512                                                         | 64, 128, 256, 512, 1024, 2048,<br>or 4096                                                                                                                                                                                                       |

 Table 21.
 Data Compression Logical Instance Parameters

NOTE: "Default" denotes the value in the configuration file when shipped.

### 4.3.6 Setting the Core Affinity Parameter for a Logical Instance

When instances are configured with IsPolled = 1 (Polling mode), the parameter CoreAffinity does not have any impact.

Although not used, it is a valid parameter and applications can query the value using cpaCyInstanceGetInfo2 (see coreAffinity bitmask in CpaInstanceInfo2). For example, the sample code affinitizes the thread that uses an instance to the core indicated in CoreAffinity the config file for that instance.

For instances configured in Interrupt Mode (IsPolled = 2 in user space (epoll) and IsPolled = 1 in kernel space), the value of CoreAffinity is used to affinitize the interrupt handler to that core.



# 4.4 Configuring Multiple Intel<sup>®</sup> QAT Endpoints in a System

A platform may include more than one Intel® QAT Endpoint. Each device must have its own configuration file. The format and structure of the configuration file is exactly the same for all devices. Consequently, the configuration file for Intel® QAT Endpoint 0, (c6xx\_dev0.conf, for the Intel® C62x Chipset; c3xxx\_dev0.conf, for the Intel® Atom® C3000 Processor Family SoC; d15xx\_dev0.conf, for the Intel® Xeon® Processor D Family), can be cloned for use with other Intel® QAT Endpoints.

All the configuration files are located in the /etc folder following the installation of the Intel<sup>®</sup> QAT package.

Simply make a copy of the file and rename it by changing the dev0 part of the file name. For example, for a second Intel<sup>®</sup> C62x Chipset Intel<sup>®</sup> QAT Endpoint, change the file name to c6xx\_dev1.conf; for a third Intel<sup>®</sup> QAT Endpoint, change the Intel<sup>®</sup> QAT Intel<sup>®</sup> QAT Endpoint by editing the corresponding configuration file accordingly.

# **NOTE:** If a configuration file does not exist for an Intel<sup>®</sup> QAT Endpoint, that endpoint will not start, and an error is displayed indicating that a configuration file was not found.

To determine the number of Intel<sup>®</sup> QAT Endpoints in a system, use the lspci utility: lspci -nn | egrep -e '8086:37c8|8086:19e2|8086:0435|8086:6f54'

The output from a system with a high-end Intel<sup>®</sup> C62x Chipset SKU is similar to the following: 88:00.0 Co-processor [0b40]: Intel Corporation Device [8086:37c8] (rev 03) 8a:00.0 Co-processor [0b40]: Intel Corporation Device [8086:37c8] (rev 03) 8c:00.0 Co-processor [0b40]: Intel Corporation Device [8086:37c8] (rev 03)

Then, after the driver is loaded, the user can use the  $qat\_service$  script to determine the name of each Intel<sup>®</sup> QAT Endpoint and its status. For example:

```
# service qat_service status
qat_dev0 - type: c6xx, inst_id: 0, bsf: 06:00:0, #accel: 5 #engines: 10
state: up qat_dev1 - type: c6xx, inst_id: 1, bsf: 85:00:0, #accel: 5
#engines: 10 state: up qat_dev2 - type: c6xx, inst_id: 2, bsf: 87:00:0,
#accel: 5 #engines: 10 state: up
```

The <code>qat\_service</code> can start, stop, restart and shutdown each device separately or all Intel<sup>®</sup> QAT Endpoints together. Refer to <u>Section 3.6</u>, <u>Managing Intel QuickAssist Technology</u> <u>Endpoints Using gat\_service</u> for more information.

Some important configuration file information when using multiple Intel® QAT Endpoints:

• When specifying kernel and user space instances in the configuration file, the Cy< Number>Name and Dc<Number>Name parameters must be unique in the context of the section name only. For example, it is valid to have a parameter called Cy0Name in both a kernel instance section (if supported) and a user instance section in the same configuration file without issue. Also, the parameter names do not need to be unique at a system-wide level. For example, it is valid to have a parameter called Cy0Name in both the configuration file for dev0 and the configuration file for dev1 without issue.



• For Intel® QAT Endpoints with configuration files that have the same section name (for example, [SSL] and the same data in that section), it is necessary to use the cpaCyInstanceGetInfo2() function to distinguish between Intel® QAT Endpoints. The cpaCyInstanceGetInfo2() allows the user of the API to query which Intel® QAT Endpoint a cryptography instance handle belongs to. In addition, for any application domain defined in the configuration files (e.g., [SSL]), a call to cpaCyGetNumInstances() returns the number of cryptography instances defined for that domain across all configuration files. A subsequent call to cpaCyGetInstances() obtains these instance handles.

# 4.5 Configuring Multiple Processes on a System with Multiple Intel<sup>®</sup> QAT Endpoints

As an example, consider a system with two Intel<sup>®</sup> QAT Endpoints where it is necessary to configure two user space sections. One section is identified as <u>SSL</u> and the other is identified as Internet Protocol Security (<u>IPSec</u>).

- For the SSL section, configure eight processes, where each process has access to one acceleration instance.
- For the IPSec section, configure one process, with access to eight acceleration instances, four per Intel<sup>®</sup> QAT Endpoint.

In this scenario, the user space section of the configuration files would look like the following.

For/etc/c6xx\_dev0.conf:

[SSL] #User space section name

NumProcesses=4 # There are 4 user space process with section name SSL with access to this device

LimitDevAccess=1 # These 4 SSL user space processes only use this device

NumCyInstances=1 # Each process has access to 1 Cy instance on this device

NumDcInstances=0 # Each process has access to 0 Dc instances on this device

```
# Crypto - User instance #0
Cy0Name = "SSL0"
Cy0IsPolled = 1
Cy0CoreAffinity = 0 # Core affinity not used for polled instance
```

[IPsec] #User space section name

NumProcesses=1 # There is l user space process with section name IPSec with access to this device

LimitDevAccess=0 # This IPSec user space process may have access to other devices

NumCyInstances=4 # The IPSec process has access to 4 Cy instances on this device



NumDcInstances=0 # The IPSec process has access to 0 Dc instances on this device

```
# Crypto - User instance #0
Cy0Name = "IPSec0"
Cy0IsPolled = 1
Cy0CoreAffinity = 0 # Core affinity not used for polled instance
```

```
# Crypto - User instance #1
Cy1Name = "IPSec1"
Cy1IsPolled = 1
Cy1CoreAffinity = 0 # Core affinity not used for polled instance
```

```
# Crypto - User instance #2
Cy2Name = "IPSec2"
Cy2IsPolled = 1
Cy2CoreAffinity = 0 # Core affinity not used for polled instance
```

```
# Crypto - User instance #3
Cy3Name = "IPSec3"
Cy3IsPolled = 1
Cy3CoreAffinity = 0 # Core affinity not used for polled instance
```

For/etc/c6xx\_dev1.conf:

[SSL] #User space section name

NumProcesses=4 # There are 4 user space process with section name SSL with access to this device

LimitDevAccess=1 # These 4 SSL user space processes only use this device

NumCyInstances=1 # Each process has access to I Cy instance on this device

NumDcInstances=0 # Each process has access to 0 Dc instances on this device

```
# Crypto - User instance #0
Cy0Name = "SSL0"
Cy0IsPolled = 1
Cy0CoreAffinity = 0 # Core affinity not used for polled instance
```

[IPsec] #User space section name

NumProcesses=1 # There is I user space process with section name IPSec with access to this device

LimitDevAccess=0 # This IPSec user space process may have access to other devices

NumCyInstances=4 # The IPSec process has access to 4 Cy instances on this device

NumDcInstances=0 # The IPSec process has access to 0 Dc instances on this device

```
# Crypto - User instance #0
CyOName = "IPSecO"
CyOIsPolled = 1
CyOCoreAffinity = 0 # Core affinity not used for polled instance
# Crypto - User instance #1
CyIIsPolled = 1
CyICoreAffinity = 0 # Core affinity not used for polled instance
# Crypto - User instance #2
Cy2Name = "IPSec2"
Cy2IsPolled = 1
Cy2CoreAffinity = 0 # Core affinity not used for polled instance
```

```
# Crypto - User instance #3
Cy3Name = "IPSec3"
Cy3IsPolled = 1
Cy3CoreAffinity = 0 # Core affinity not used for polled instance
```

Eight processes (with section name SSL) can call the icp\_sal\_userStart("SSL") function to get access to one crypto instance each. One process (with section name IPSec) can call the icp\_sal\_userStart("IPSec") function to get access to eight crypto instances.

Internally in the driver, this works as follows:

- 1. When the driver is configured (that is, the service qat\_service is called), the driver reads the configuration file for the device and populates an internal configuration table.
- 2. Reading the configuration file for dev0:
  - For the section named [SSL], the driver determines that four processes are required and that these processes limit access to this device only. In this case, the driver creates four internal sections that it labels SSL\_DEV0\_INT\_0, SSL\_DEV0\_INT\_1, SSL\_DEV0\_INT\_2 and SSL\_DEV0\_INT\_3. Each section is given access to one crypto instance as described.
  - For section name [IPSec], the driver determines that one process is required and that this process does not limit access to this device only (that is, it may access instances on other devices). In this case, the driver creates one internal section that it labels IPSec\_INT\_0 and gives this access to four crypto instances on this device.
- 3. Reading the configuration file for dev1:
  - For the section named [SSL], the driver determines that four processes are required and that these processes are limited to access this device only. In this case, the driver creates four internal sections that it labels SSL\_DEV1\_INT\_0, SSL\_DEV1\_INT\_1, SSL\_DEV1\_INT\_2 and SSL\_DEV1\_INT\_3. Each section is given access to one crypto instance as described.
  - For the section named [IPSec], the driver determines that one process is required and that this process may have access to instances on other



devices. In this case, the driver creates one internal section that it labels IPSec\_INT\_0 and gives this access to four crypto instances on this device.

- *NOTE:* This section name now appears in both devices' internal configuration and, therefore, the process that gets assigned this section name will have access to instances on both devices.
- 4. In total, there are nine separate sections (SSL\_DEV0\_INT\_0, SL\_DEV0\_INT\_1, SSL\_DEV0\_INT\_2, SSL\_DEV0\_INT\_3, SSL\_DEV1\_INT\_0, SL\_DEV1\_INT\_1, SSL\_DEV1\_INT\_2, SSL\_DEV1\_INT\_3 and IPSec\_INT\_0) with access to crypto instances.

When a process calls the  $icp_sal\_userStart$  ("SSL") function, the driver locates the next available section of the form  $SSL\_DEV < m > INT < ... >$  (of which there are eight in total in this example) and assigns this section to the process. This gives the process access to corresponding crypto instances.

When a process calls the icp\_sal\_userStart ("IPSec") function, the driver locates the next available section of the form IPSec\_INT\_<...> (of which there is only one in total for this example) and assigns this section to the process. This gives the process access to the corresponding crypto instances.

 $\label{eq:linear} The \verb"icp_sal_userStartMultiProcess"() function has been deprecated. The API still exists, but it simply calls \verb"icp_sal_userStart"().$ 

# 4.6 Sample Configuration File

#### Sample configuration files are available in

quickassist/utilities/adf\_ctl/conf\_files. Depending on the product and configuration, one or more of these will be copied to /etc during the package installation.

**NOTE:** The previous "v1" configuration file format is not supported.

§

# 5 Secure Architecture Considerations

This chapter describes the potential threats identified as part of the secure architecture analysis of the Intel<sup>®</sup> Quick Assist Technology acceleration complex within the Intel<sup>®</sup> Communications C62x Chipset family and the actions that can be taken to protect against these threats.

This chapter concentrates on the acceleration complex. First, the terminology covering the main threat categories and mechanisms, attacker privilege and deployment models are presented. Then, some common mitigation actions that can be applied to many of these threat categories and mechanisms are discussed. Finally, more specific threat/attack vectors, including attacks against specific services of the PCH device are described.

# 5.1 Terminology

Each of the potential threat/attack vectors discussed may be described in terms of the following:

- <u>Threat Categories</u>
- Attack Mechanism
- <u>Attacker Privilege</u>
- Deployment Models

#### 5.1.1 Threat Categories

System threats can be classified into the categories in the following table.

#### Table 22. System Threat Categories

| Category             | Nature of Threat and Examples                                                                                                                                                                                                      |  |  |
|----------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|--|
| Exposure of Data     | <ul> <li>Attacker reads data to which they should not have read access</li> <li>Attacker reads cryptographic keys</li> </ul>                                                                                                       |  |  |
| Modification of Data | <ul> <li>Attacker overwrites data to which they should not have write access</li> <li>Attacker overwrites cryptographic keys</li> </ul>                                                                                            |  |  |
| Denial of Service    | <ul> <li>Attacker causes application or driver software (running on an IA core) to fail or terminate.</li> <li>Attacker causes Intel<sup>®</sup> QuickAssist Accelerator firmware to hang, temporarily impeding service</li> </ul> |  |  |



| Category | Nature of Threat and Examples                                               |
|----------|-----------------------------------------------------------------------------|
|          | - Attacker causes excessive use of resource (IA core, ${\sf Intel}^{\circ}$ |
|          | QuickAssist Accelerator firmware thread, silicon slice, PCIe $^{\star}$     |
|          | bandwidth, and so on), thereby reducing availability of the                 |
|          | service to legitimate client                                                |

# 5.1.2 Attack Mechanism

Some of the mechanisms by which an attacker can carry out an attack are listed in the following table.

| Mechanism                           | Examples                                                                                                                                                                                                                                                      |
|-------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Contrived Packet<br>Stream          | Attacker crafts a packet stream that exploits known vulnerabilities in the software, firmware, or hardware. This could include vulnerabilities such as buffer overflow bugs, lack of parameter validation, and so on.                                         |
| Compromised<br>Application Software | Attacker modifies the application code calling the Intel® QuickAssist<br>Technology API to exploit known vulnerabilities in the driver/hardware.                                                                                                              |
| Application Malware                 | In an environment where an attacker may be able to run their own<br>application, separate from the main application software, they may invoke<br>the Intel <sup>®</sup> QuickAssist Technology API to exploit known vulnerabilities in<br>the driver/hardware |
| Compromised IA driver software      | Attacker modifies the IA driver to exploit known vulnerabilities in the driver/hardware.                                                                                                                                                                      |
| Defect                              | It is also possible that the attack is not malicious, but rather an unintentional defect                                                                                                                                                                      |

# Table 23. Attack Mechanisms and Examples

# 5.1.3 Attacker Privilege

The following table describes the privileges that an attacker may have. The table describes the case of a non-virtualized system.

| Privilege                         | Comments                                                                                                                                                                                                                                           |
|-----------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Physical access                   | There is no attempt to protect against threats, such as signal probes,<br>where the attacker has physical access to the system. Customers can<br>protect their systems using physical locks, tamper-proof enclosures,<br>Faraday cages, and so on. |
| Logged in as privileged<br>user   | There is no attempt to protect against threats where the attacker is logged in as a privileged user. Customers can protect their systems using strong, frequently changed passwords, and so on.                                                    |
| Logged in as<br>unprivileged user | If the attacker is logged into a platform as an unprivileged user, it is important to ensure that they cannot use the services of the PCH to access (read or write) any data to which they would not otherwise have access.                        |

#### Table 24. Attacker Privilege

г



| Privilege               | Comments                                                                                                                                                                                                      |
|-------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Ability to send packets | In almost all deployments, attackers have the ability to send arbitrary packets from the network into the system. It is assumed that threats (for example, denial of service attacks) may arrive in this way. |

# 5.1.4 Deployment Models

Some of the possible deployment models are given in the following table.

#### Table 25. Deployment Models

| Deployment Model                        | Examples                                                                   |  |
|-----------------------------------------|----------------------------------------------------------------------------|--|
| System with no untrusted users          | <ul><li>Network security appliance</li><li>Server in data center</li></ul> |  |
| System with potentially untrusted users | Server in data center                                                      |  |

# 5.2 Threat/Attack Vectors

A thorough analysis has been conducted by considering each of the threat categories, attack mechanisms, attacker privilege levels, and deployment models. As a result, the following threats have been identified. Also described are the steps a user of the PCH chipset can take to mitigate against each threat. Some general practices that mitigate many of the common threats are considered first. Thereafter, threats on specific services and mitigation against those threats are described.

# 5.2.1 General Mitigation

The following mitigation techniques are generic to different threats and attack vectors:

- Ensure that all software running on the platform that has access to Intel<sup>®</sup> Quick Assist Technology devices is within the trust boundary of the platform owner. This mitigation includes software running in virtual machines and containers.
- Intel<sup>®</sup> follows Secure Coding guidelines, including performing code reviews and running static analysis on its driver software and firmware, to ensure its compliance with security guidelines. It is recommended that customers follow similar guidelines when developing application code. This should include the use of tools such as static analysis, fuzzing, and so on.
- Ensure each hardware component, including the PCH chipset, processor, and DRAM, is physically secured from attackers. This can include such examples as physical locks, tamper proofing, and Faraday cages (to prevent side-channel attacks via electromagnetic radiation).
- Ensure that network services not required on the module are not operating and that the corresponding network ports are locked down.



• Use strong passwords to protect against dictionary and other attacks on administrative and other login accounts.

# 5.2.2 General Threats

General threats include the following:

- <u>DMA</u>
- Intentional Modification of IA Driver
- Modification of the QAT Configuration File
- Malicious Application Code
- Denial of Service

# 5.2.2.1 DMA

**Threat:** The PCH can perform Direct Memory Access (DMA, the copying of data) between defined memory locations. Once an attacker has sufficient privilege to invoke the Intel<sup>®</sup> QuickAssist Technology API, or to write to/read from the hardware rings used by the driver to communicate with the device, they can send requests to the Intel<sup>®</sup> QuickAssist Accelerator to perform such DMA, passing arbitrary physical memory addresses as the source and/or destination addresses, thereby exposing or modifying regions of memory to which they would otherwise not have access.

**Mitigation:** Ensure that only trusted users are granted permissions to access the Intel<sup>®</sup> QuickAssist Technology API, or to write to and read from the hardware rings. Specifically, the PCH configuration file describes logical instances of acceleration services and the set of hardware rings to be used for each such instance. User processes can ask the kernel driver to map these rings into their address spaces. To access a given device (identified by the number in the filenames below), the user must be granted read/write access to the following files, which may be in/dev:

- uio<0..N> (where "0..N" are the qat uio device numbers)
- qat\*
- usdm\_drv

# 5.2.2.2 Intentional Modification of IA Driver

**Threat**: An attacker can potentially modify the IA driver to behave maliciously. This may lead to a denial of service of Intel<sup>®</sup> Quick Assist Technology services.

**Mitigation**: The driver object/executable file on disk should be protected using the normal file protection mechanisms so that it is writable only by trusted users, for example, a privileged user or an administrator. Specifically, the Intel<sup>®</sup> QuickAssist Technology kernel objects and libraries should not be writeable by user. If the *qat* user group is being used to provide access to Intel<sup>®</sup> Quick Assist Technology services, then this group should not have write permission to the binaries.



# 5.2.2.3 Modification of the QAT Configuration File

**Threat**: The QAT configuration file is read at initialization time by the driver and specifies what instances of each service (cryptographic, data compression) should be created, and which rings each service instance will use. Modifying this file could lead to denial of service by deleting required instances or could be used to attempt to create additional instances that the attacker could subsequently attempt to access for malicious purposes.

**Mitigation**: The configuration file should be protected using the normal file protection mechanisms so that it is writable only by trusted users, for example, a privileged user or an administrator.

**NOTE:** By default, the configuration file is stored in the /etc directory and may be named something like, c6xxx\_dev0.conf. Its default permissions are that it is readable and writeable only by root user and qat group.

# 5.2.2.4 Malicious Application Code

**Threat**: An attacker who can gain access to the Intel<sup>®</sup> QuickAssist Technology API may be able to exploit the following features of the API:

- Buffers passed to the API have a specified length of up to 32 bits. By specifying excessive lengths, an attacker may be able to cause denial of service by overwriting data beyond the end of a buffer.
- Buffer lists passed to the API consist of a scatter gather list (array of buffers). An attacker may incorrectly specify the number of buffers, causing denial of service due to the reading or writing of incorrect buffers.

**Mitigation**: Platform management can include the Rate Limiting feature to mitigate against Noisy Neighbors. Only trusted users and applications should be allowed to access the Intel<sup>®</sup> QuickAssist Technology API, as described in General Mitigations.

# 5.2.2.5 Denial of Service

Threat: An attacker may construct a service request that does not conform to the specification, resulting in low of service due to service timeouts, halting of Quick Assist service or undesired platform level conditions.

**Mitigation**: The current generation of Intel<sup>®</sup> Quick Assist Technology has been designed for performance, providing direct access to hardware via PCIe<sup>\*</sup> MMIO space. Misuse of hardware registers is to be avoided, and the threat against intentional misuse must be mitigated by ensuring all software on the platform is trusted.

An attacker may attempt to contrive a packet stream that monopolizes the acceleration services, thereby denying service to legitimate users. This may consist of one or more of the following:

- Sending packets that are compressed (for example, using IPComp) or encrypted (for example, using IPsec), thereby reducing the availability of these services to legitimate traffic.
- Sending excessively large packets, causing some latency for legitimate packets.



• Sending small packets at a high packet rate, causing extra bandwidth utilization on the PCI Express\* bus connecting the device to the processor.

**Mitigation**: Proper monitoring of Device Usage (DU) and the construction of Service Level Agreements (SLA) are now available as part of the Rate Limiting feature.

# 5.2.3 Threats Specific to Cryptographic Service

Threats against the cryptographic service include:

<u>Reading of Cryptographic Keys</u>

# 5.2.3.1 Reading Cryptographic Keys

**Threat**: Cryptographic keys are stored in DRAM. An attacker who can determine where these are stored could read the DRAM to get access to the keys or could write the DRAM to use keys known by the attacker, thereby compromising the confidentiality of data protected by these keys. Some cryptographic keys have long lives. The impact of an attacker obtaining the key may exist for the lifetime of the key itself.

**Mitigation**: DRAM is considered inside the cryptographic boundary (as defined by FIPS 140-2). The normal memory protection schemes provided by the Intel<sup>®</sup> architecture processor and memory controller, and by the operating system, prevent unauthorized access to these memory regions.

§

# 6 Supported APIs

The supported APIs are described in two categories:

- Intel<sup>®</sup> QuickAssist Technology APIs
- Additional APIs

# 6.1 Intel<sup>®</sup>QAT APIs

The platforms described in this manual support the following Intel® QAT API libraries:

- Cryptographic API definitions are located in: \$ICP\_ROOT/quickassist/
  include/lac, where \$ICP\_ROOT is the directory where the Acceleration software is
  unpacked. See the Intel® QuickAssist Technology Cryptographic API Reference
  Manual (refer to Table 2) for details.
- Data Compression API definitions are located in: \$ICP\_ROOT/quickassist/ include/dc. See the Intel® QuickAssist Technology Data Compression API Reference Manual (refer to Table 2) for details.

Base API definitions that are common to the API libraries are located in: *SICP\_ROOT*/ quickassist/include. See also the *Intel® QuickAssist Technology API Programmer's Guide* (refer to <u>Table 2</u>) for guidelines and examples that demonstrate how to use the APIs.

# 6.1.1 Intel® QAT API Limitations

The following limitations apply when using the Intel<sup>®</sup> QAT APIs on the platforms described in this manual:

- For all services, the maximum size of a single perform request is 4 GB.
- For all services, data structures that contain data required by the Intel® QAT Endpoint should be on a 64-byte-aligned address to maximize performance. This alignment helps minimize latency when transferring data from DRAM to an Intel® QAT Endpoint integrated in the PCH device.
- For the key generation cryptographic API, the following limitations apply:

| Caution: | Secure Sockets Layer (SSL) key generation op-data:                |
|----------|-------------------------------------------------------------------|
|          | Maximum secret length is 512 bytes                                |
|          | Maximum userLabel length is 136 bytes                             |
|          | Maximum generatedKeyLenInBytes is 248                             |
| Caution: | Transport Layer Security (TLS) key generation op-data             |
|          | Secret length must be <128 bytes for TLS v1.0/1.1; <512 bytes for |
|          | TLS v1.2 userLabel length must be <256 bytes                      |
|          | Maximum seed size is 64 bytes                                     |
|          | Maximum generatedKeyLenInBytes is 248 bytes                       |
| Caution: | Mask Generation Function (MGF) op-data                            |
|          | Maximum seed length is 255 bytes                                  |
|          | Maximum maskLenInBytes is 65528                                   |
|          |                                                                   |



- For the cryptographic service, SNOW 3G and KASUMI\* operations are not supported when CpaCySymPacketType is set to CPA\_CY\_SYM\_PACKET\_TYPE\_PARTIAL. The error returned in this case is CPA\_STATUS\_INVALID\_PARAM.
- For the cryptographic service, when using the asymmetric crypto APIs, the buffer size passed to the API should be rounded to the next power of 2, or the next 3- times a power of 2, for optimum performance.
- For the data compression service, the size of all stateful decompression requests have to be a multiple of two with the exception of the last request.
- For the data compression service, the CpaDcFileType field in the CpaDcSessionSetupData data structure is ignored (previously this was considered for semi-dynamic compression/decompression).
- For static compression, the maximum expansion during compression is ceiling (9\*Total\_Input\_Byte/8)+7 bytes. If CPA\_DC\_ASB\_ENABLED is selected, the maximum expansion during compression is the input buffer size plus ceiling (Total\_Input\_Byte/65535) \* 5 bytes.
- *NOTE:* Due to the need for a skid pad and the way the checksum is calculated in the stored block case to prevent compression overflow, an output buffer size of ceiling (9\*Total\_Input\_Byte/8) + 55 bytes needs to be supplied (even though the stored block output size might be less).

The decompression service can report various error conditions, most of which arise from processing dynamic Huffman code trees that are ill-formed. These soft error conditions are reported at the Intel<sup>®</sup> QAT API using the CpaDcReqStatus enumeration. At the point of soft error, the hardware state will not be accurate to allow recovery. Therefore, in this case, the Intel<sup>®</sup> QAT software rolls back to the previous known good state and reports that no input has been processed and no output produced. This allows an application to correct the source of the error and resubmit the request.

For example, if the following source and destination buffers were submitted to the  ${\rm Intel}^{\circ}$  QAT

| Data Length 16K        |                        | Corrupt         | Data Length 18K |  |
|------------------------|------------------------|-----------------|-----------------|--|
| Valid Deflate<br>Block | Valid Deflate<br>Block | Deflate<br>Data |                 |  |
| •                      |                        |                 | 4               |  |

The result would be:

| pSrcBuffer<br>Data Length 168 |                        |                            | pDst8uffer<br>Data Length 18K |  |
|-------------------------------|------------------------|----------------------------|-------------------------------|--|
| Valid Deflate<br>Block        | Valid Deflate<br>Block | Corrupt<br>Deflate<br>Data | Some uncompressed Data        |  |
|                               |                        |                            | Produced=0                    |  |

Behavior when build flag ICP\_DC\_RETURN\_COUNTERS\_ON\_ERROR is defined. In some specialized applications, when a decompression soft error occurs, the application has no way of correcting the source of the error and resubmitting the request. The session will need to be invalidated and terminated. In this case it is more useful to the application to output the uncompressed data up to the point of soft error before terminating the session. There is a compile time build flag (ICP\_DC\_RETURN\_COUNTERS\_ON\_ERROR) to select this mode of operation. This is the behavior of decompression in case of soft error when this build flag is used.

If the following source and destination buffers were submitted to the Intel<sup>®</sup> QAT API:



#### The result would be:



It is important to note in this case:

**Caution:** The consumed value returned in the CpaDcRqResults structure is not reliable. **Caution:** No further requests can be submitted on this session.

• For stateful decompression, the maximum output size is 4.29 GB (2<sup>32</sup> bytes).

# 6.1.1.1 Resubmitting After Getting an Overflow Error

The following table describes the behavior of the Intel<sup>®</sup> QAT compression service when an overflow occurs during a compression or decompression operation.

It describes the expected behavior of an application when an overflow occurs.



|                 | Operation                  | Overflow<br>Supported | Input Data<br>Consumed ?                                   | Valid Data<br>Produced?                                    | Status<br>Returned in<br>Results | Note                                            |
|-----------------|----------------------------|-----------------------|------------------------------------------------------------|------------------------------------------------------------|----------------------------------|-------------------------------------------------|
|                 | Stateless<br>compression   | YES                   | Possible -<br>indicated in<br>results<br>consumed<br>field | Possible -<br>indicated in<br>results<br>produced<br>field | -11                              | Overflow is<br>considered<br>as an<br>exception |
| Traditional API | Stateless<br>decompression | NO                    | NO                                                         | NO                                                         | -11                              | Overflow is considered as an error              |
|                 | Stateful<br>decompression  | YES                   | Possible -<br>indicated in<br>results<br>consumed<br>field | Possible -<br>indicated in<br>results<br>produced<br>field | -11                              | Overflow is<br>considered<br>as an<br>exception |
| Data Plane API  | Stateless<br>compression   | NO                    | NO                                                         | NO                                                         | -11                              | Overflow is<br>considered<br>as an error        |
|                 | Stateful<br>decompression  | NO                    | NO                                                         | NO                                                         | -11                              | Overflow is considered as an error              |

The Intel<sup>®</sup> QAT releases enable the Compress and Verify feature by default for compression requests. The Compress and Verify feature imply that sessions can only be **Stateless** in the compression direction.

# 6.1.1.1.1 Overflow Exception in the Traditional API

Stateless sessions support overflow as an exception for traditional API in the compression direction only. This means that the application can rely on the cpaDcRqResults.consumed to resubmit from where the overflow occurred. An overflow in the decompression direction must be treated as an error.

In this case, the application must resubmit the request with a larger buffer as described in the procedure for handling overflow errors. For stateful sessions, overflow is supported only in the decompression direction.

# 6.1.1.1.2 Overflow error in the Data Plane API

The Data Plane API considers overflow status as an error. If an overflow occurs with the data plane API, the driver will output the following error message to the user:

"Unrecoverable error: stateless overflow. You may need to increase the size of your destination buffer"

In this case, cpaDcRqResults.consumed, .produced and .checksum should be ignored. If length and checksum are required, they must be tracked in the application, because they are not maintained in the session.

# 6.1.1.1.3 Procedure for Handling Overflow Errors

Resubmit the request with the following data:

- Use the same Source buffer.
- Allocate a bigger Destination buffer.
- Put the checksum from the previous successful request into the cpaDcRqResults struct.

# 6.1.1.1.4 Compression Overflow Support in A Virtualized Environment

In a virtual environment, the guest does not download the firmware. Only the host downloads the firmware.

Therefore, if the guest runs a newer Intel<sup>®</sup> QAT driver than the host, the guest application might experience false CNV errors. The correct course of action would be to update the host with the latest Intel<sup>®</sup> QAT driver.

# 6.1.1.1.5 Avoiding a Compression Overflow Exception

Overflow happens for 2 reasons:

- 1. The application allocated a destination buffer that was too small to receive the compressed data.
- 2. A recovery occurred after a compress and verified error with an input payload greater than 65,535 bytes.

To minimize the impact of resubmitting data after and overflow exception, the API cpaDcDeflateCompressBound () has been added to the Intel® QAT driver. This new API will provide to the application a recommended destination buffer size to avoid the exception. This API must be called by the application before allocating the destination buffer.

The cpaDcDeflateCompressBound () API requires the instance handle so that the formula that it uses is tailored to the device generation.

# 6.1.1.2 Dynamic Compression for Data Compression Service

Dynamic compression involves feeding the data produced by the compression hardware block to the translator hardware block. <u>Figure 5</u> shows the dynamic compression data path.



#### Figure 5. Dynamic Compression Data Path



When the application selects the Huffman type to CPA\_DC\_HT\_FULL\_DYNAMIC in the session and auto-select best feature is set to CPA\_DC\_ASB\_DISABLED, the compression service may not always produce a deflate stream with dynamic Huffman trees.

In the case of Stateful decompression requests, if the service returns an exception (e.g., overflow status in the results), it is recommended to examine the bytes consumed and returned in the CpaDcRqResults structure to verify if all the data in the source data buffer has been processed. Unprocessed data can be submitted in a subsequent request that uses the offset reported by the consumed field in the CpaDcRqResults structure.

# 6.1.1.3 Maximal Expansion with Auto Select Best Feature for Compression

Some input data may lead to a lower-than-expected compression ratio. This is because the input data may not be very compressible.

To achieve a maximum compression ratio, the acceleration unit provides an auto select best (ASB) feature. In this mode, the Intel<sup>®</sup> QuickAssist Technology hardware will first execute static compression followed by dynamic compression and then select the output that yields the best compression ratio.

However, if the produced data both for dynamic and static operations return a greater value than the uncompressed source data and source block headers, the source data will be used as a stored block.

A 5-byte stored block header is always prepended to the stored block.

To use the ASB feature, configure the autoSelectBestHuffmanTree enum during the session creation.

Regardless of the ASB setting selected, dynamic compression will only be attempted if the session is configured for dynamic compression.

There are five possible settings available for the autoSelectBestHuffmanTree when creating a session. Based on the ASB settings described below, the produced data returned in the CpaDcRqResults structure will vary.

# 6.1.1.3.1 CPA\_DC\_ASB\_DISABLED

ASB mode is disabled.

# 6.1.1.3.2 CPA\_DC\_ASB\_STATIC\_DYNAMIC

**This setting is deprecated.** To avoid incompatibility with older applications, it is internally redirected to CPA\_DC\_ASB\_ENABLED. Redirecting to CPA\_DC\_ASB\_ENABLED effectively means that despite what the old enum name suggests, QAT is now allowed to return an uncompressed block with this option (if it's smaller than compressed block would be).

# 6.1.1.3.3 CPA\_DC\_ASB\_UNCOMP\_STATIC\_DYNAMIC\_WITH\_STORED\_HDRS

**This setting is deprecated**. To avoid incompatibility with older applications, it is internally redirected to CPA\_DC\_ASB\_ENABLED.

# 6.1.1.3.4 CPA\_DC\_ASB\_UNCOMP\_STATIC\_DYNAMIC\_WITH\_NO\_HDRS

**This setting is deprecated**. To avoid incompatibility with older applications, it is internally redirected to CPA\_DC\_ASB\_ENABLED.

For QAT 1.6/1.7 Hardware, deprecation means it is no longer possible to return an uncompressed stored block without a compliant DEFLATE header.

# 6.1.1.3.5 CPA\_DC\_ASB\_ENABLED

ASB mode is enabled. When CPA\_DC\_ASB\_ENABLED is used, the output will be a format compliant block with a proper header, whether the data is compressed or uncompressed. QAT is allowed to return an uncompressed or compressed (static/dynamic) block, whichever is smaller.

CPA\_DC\_ASB\_ENABLED behaves the same as CPA\_DC\_ASB\_UNCOMP\_STATIC\_DYNAMIC\_WITH\_STORED\_HDRS did in previous version of QAT.

# 6.1.1.4 Maximal Expansion and Destination Buffer Size in Compression Direction

For static compression operations, the worst-case possible expansion can be expressed as:

Max Static Produced data in bytes = ceil(9 \* Total input bytes / 8) + 7

The memory requirement for the destination buffer is expressed by the following formula:

Destination buffer size in bytes = ceil((9 \* Total input bytes + (8-1))/8) + 55 bytes + N bytes

With: ceil(x,y) = (x + (y - 1)) / y N = 8 - (total input byte count) when total input byte count < 8 Or N = 0 when total input byte count >= 8



The destination buffer size must consider the worst-case possible maximal expansion + 55 bytes + N bytes

Example 1 with an input source size of 111,261 bytes: Memory required for destination buffer = ceil((9 \* 111261 + (8 - 1)) / 8) + 55 + (111261 < 8 ? (8 - 111261) : 0) = ceil (125169.5) + 55 + 0 = 125169 + 55 + 0 = 125224 bytes to be allocated Example 2 with a 7-byte input source size:

```
Memory required for destination buffer = ceil((9 * 7 + (8 - 1)) / 8) + 55
+ (7 < 8 ? (8 - 7) : 0)
= ceil (8.75) + 55 + 1
= 8 + 55 + 1
= 64 bytes to be allocated</pre>
```

**NOTE:** Regardless of the ASB settings, the memory must be allocated for the worst case. If an overflow occurs, either using static or dynamic compression, then the returned counters, status, and expected application behavior is as shown per <u>Table 27</u>.

# 6.1.2 Data Plane APIs Overview

The Intel<sup>®</sup> QAT Cryptographic API Reference Manual and the Intel<sup>®</sup> QAT Data Compression API Reference Manual (refer to <u>Table 2</u>) contain information on the APIs that are specific to data plane applications.

The APIs are recommended for applications that are executing in a data plane environment where the cost of offload (that is, the cycles consumed by the driver sending requests to the hardware) needs to be minimized. To minimize the cost of offload, several constraints have been placed on the APIs. If these constraints are too restrictive for your application, the traditional APIs can be used instead (at a cost of additional IA cycles).

The definition of the Cryptographic Data Plane API's are contained in: \$ICP\_ROOT/quickassist/include/lac/cpa\_cy\_sym\_dp.h

The definition of the Data Compression Data Plane APIs is contained in: \$ICP\_ROOT/quickassist/include/dc/cpa\_dc\_dp.h

# 6.1.2.1 IA Cycle Count Reduction When Using Data Plane APIs

From an IA cycle count perspective, the Data Plane APIs are more performant than the traditional APIs (that is, for example, the symmetric cryptographic APIs defined in \$ICP\_ROOT/quickassist/include/lac/cpa\_cy\_sym.h). The majority of the cycle count reduction is realized by the reduction of supported functionality in the Data Plane APIs and the application of constraints on the calling application (refer to <u>Section 6.1.2.2</u>, <u>Usage Constraints</u> on the Data Plane APIs).

In addition, to further improve performance, the Data Plane APIs attempt to amortize the cost of an MMIO access when sending requests to, and receiving responses from, the hardware.

A typical usage is to call the <code>cpaCySymDpEnqueueOp()</code> or the <code>cpaDcDpEnqueueOp()</code> function multiple times with requests to process and the <code>performOpNow</code> flag set to CPA\_FALSE. Once multiple requests have been enqueued, the

cpaCySymDpEnqueueOp() or cpaDcDpEnqueueOp() function may be called with the performOpNow flag set to CPA\_TRUE. This sends the requests to the Intel<sup>®</sup> QAT Endpoint for processing. This sequence is shown in <u>Figure 6</u>.



#### Figure 6. Amortizing the Cost of an MMIO Across Multiple Requests

The Intel® QAT API returns a CPA\_STATUS\_RETRY when the ring becomes full.

The number of requests to place on the ring is application dependent and it is recommended that performance testing be conducted with tunable parameter values.

Two functions, cpaCySymDpPerformOpNow() and cpaDCDpPerformOpNow(), are also provided that allow queued requests to be sent to the hardware without the need for queuing an additional request. This is typically used in the scenario where a request has not been received for some time and the application would like the enqueued requests to be sent to the hardware for processing.



# 6.1.2.2 Usage Constraints on the Data Plane APIs

The following constraints apply to the use of the Data Plane APIs. If the application can handle these constraints, the Data Plane APIs can be used:

- Thread safety is not supported. Each software thread should have access to its own unique instance (CpaInstanceHandle) to avoid contention on the hardware rings.
- For performance, polling is supported, as opposed to interrupts (which are comparatively more expensive).
- Polling functions (refer to <u>Section 6.2.2</u>, <u>Polling Functions</u>) are provided to read responses from the hardware response queue and dispatch callback functions.
- Buffers and buffer lists are passed using physical addresses to avoid virtual-tophysical address translation costs.
- Alignment restrictions are placed on the operation data (that is, the
- CpaCySymDpOpData structure) passed to the Data Plane API. The operation data must be at least 8-byte aligned, contiguous, resident, DMA-accessible memory.
- Only asynchronous invocation is supported, that is, synchronous invocation is *not* supported.
- There is no support for cryptographic partial packets. If support for partial packets is required, the traditional Intel<sup>®</sup> QAT APIs should be used.
- Since thread safety is *not* supported, statistic counters on the Data Plane APIs are not atomic.
- The *default* instance (CPA\_INSTANCE\_HANDLE\_SINGLE) is not supported by the Data Plane APIs. The specific handle should be obtained using the instance discovery functions (cpaCyGetNumInstances(), cpaCyGetInstances()).
- The submitted requests are always placed on the high-priority ring.
- The data plane APIs are supported in both user space and polling mode in kernel space, but not supported in interrupt mode in kernel space.

# 6.1.2.3 Cryptographic and Data Compression API Descriptions

Full descriptions of the Intel<sup>®</sup> QAT APIs are contained in the Intel<sup>®</sup> QAT Cryptographic API Reference Manual and the Intel<sup>®</sup> QAT.

Data Compression API Reference Manual (refer to <u>Table 2</u>). In addition to the Intel<sup>®</sup> QAT Data Plane APIs, there are several Data Plane Polling APIs that are described in <u>Section 6.2.2</u>, <u>Polling Functions</u>.

# 6.1.3 Recovering from a Compress and Verify Error

The Compress and Verify and Recover (CnVnR) feature allow a compression error to be recovered in a seamless manner. It is supported in both the Traditional and in the Data Plane API.

The CnVnR feature is an enhancement of the existing Compress and Verify (CnV) solution. When a compress and verify error is detected, the Intel<sup>®</sup> QAT software will do a correction without returning a CnV error to the application.

When a recovery occurs,  $\tt CpaDcRqResults.status$  will return <code>CPA\_DC\_OK</code> or <code>CPA\_DC\_OVERFLOW</code> and the destination buffer will hold valid <code>DEFLATE</code> data.

The application can find out if CnVnR is supported by querying the instance capabilities via the cpaDcQueryCapabilities API. On completion, the

compressAndVerifyAndRecover property of the CpaDcInstanceCapabilities structure will be set to CPA TRUE if the feature is supported.

The table below provides details on the Intel® QuickAssist APIs supporting the CnVnR feature.

| API                   | CnVnR Behavior                                                                                          |  |  |  |
|-----------------------|---------------------------------------------------------------------------------------------------------|--|--|--|
| cpaDcCompressData     | Enabled by default, no option to disable it.                                                            |  |  |  |
| cpaDcCompressData2    | CnVnR is enabled when compressAndVerifyAndRecover property is set to CPA_TRUE in CpaDcOpData structure. |  |  |  |
| cpaDcDecompressData   | Not applicable                                                                                          |  |  |  |
| cpaDcDecompressData2  | Not applicable                                                                                          |  |  |  |
| cpaDcDpEnqueueOp      | CnVnR is enabled when compressAndVerifyAndRecover property is set to CPA_TRUE in CpaDcOpData structure. |  |  |  |
| cpaDcDpEnqueueOpBatch | CnVnR is enabled when compressAndVerifyAndRecover property is set to CPA_TRUE in CpaDcOpData structure. |  |  |  |

Table 27. API Support for Compress and Verify and Recover

When a CnV recovery takes place, the Intel<sup>®</sup> QAT software creates a stored block out of the input payload that could not be compressed. The maximal size of a stored block allowed by the deflate standard is 65,535 bytes.



When a stored block is created, the DEFLATE header specifies that the data is uncompressed so that the decompressor does not attempt to decode the cleartext data that follows the header. The size of a stored block can be defined as:



Stored block size = Source buffer size + 5 Bytes (*used for the deflate header*)

If a stored block needs to be created out of a cleartext payload size greater than 65,535 bytes, the Intel<sup>®</sup> QuickAssist solution creates one stored block of 65,535 bytes and CpaDcRqResults.status returns CPA\_DC\_OVERFLOW.

**NOTE:** If the application uses the Data Plane API, it is responsible for submitting request sizes smaller or equal to 65,530 bytes to avoid meeting the overflow error limit.

# 6.1.4 Counting Recovered Compression Errors

The Intel® QAT API has been updated to allow the application to track recovered compression errors. The CpaDcStats data structure has a new property called numCompCnvErrorsRecovered that is incremented every time a compression recovery happens.

The compression recovery process is agnostic to the application.

CpaDcRqResults.status returns CPA\_DC\_OK when a compression recovery takes place. The only way to know if a compression recovery took place on the current request is to call the cpaDcGetStats() API and to monitor CpaDcStats.numCompCnvErrorsRecovered.

# 6.1.5 Compress and Verify Error log in Sysfs:

The implementation of the Compress and Verify and Recover solution keeps a record of the CnV errors that have occurred since the driver was loaded. The error count is provided on a per Acceleration Engine basis.

The path to the CnV error log is:

cat /sys/kernel/debug/qat\_dh895xcc\_<Bus>\:<device>.<Function>/ cnv\_errors

Each Acceleration Engine keeps a count of the CnV errors. The CnV error counter is reset when the driver is loaded. The tool also reports the last error type that caused a CnV error.

# 6.1.6 Supported Algorithms in LKCF

If LKCF is enabled (see section <u>Enabling Linux\* Kernel Crypto Framework (LKCF</u>)), the following algorithms and templates are supported:

- authenc(hmac(shal),cbc(aes)) legacy
- authenc(hmac(sha256),cbc(aes))
- authenc(hmac(sha512),cbc(aes))
- cbc(aes)
- ctr(aes)
- xts(aes)
- gcm(aes) available only with CE release package R4.24.0 and newer, or QAT1.8 driver version 1.11.0, for kernels 4.3.0 and newer
- rsa
- dh legacy

For more information on the usage of these algorithms through LKCF, refer to the LKCF API documentation provided by the Linux\* kernel, including <u>https://www.kernel.org/doc/html/v4.15/crypto/architecture.html#ciphers-and-templates</u>, or <u>https://www.kernel.org/doc/html/v6.0/crypto/architecture.html#ciphers-and-templates</u>.

# 6.2 Additional APIs

There are a number of additional APIs that can serve for optimization and other uses outside of the Intel<sup>®</sup> QAT services.

NOTE: Not all additional APIs are supported with all versions of the software package.

The additional APIs are grouped into the following categories:

- IOMMU Remapping Functions
- Polling Functions
- User Space Access Configuration Functions
- Version Information Function
- <u>Thread-less APIs</u>
- <u>Compress and Verify (CnV) Related APIs</u>
- Heartbeat APIs
- <u>Device Polling APIs</u>
- 1. Congestion Management APIs
- 2. Service Specific Polling APIs
- 3. Check Device Availability APIs

# 6.2.1 IOMMU Remapping Functions

These functions are intended for IOMMU remapping operations.

All IOMMU remapping function definitions are in: \$ICP\_ROOT/quickassist/
lookaside/access\_layer/include/icp\_sal\_iommu.h.

The IOMMU remapping functions include:

- Section 6.2.2.1, icp\_sal\_iommu\_get\_remap\_size
- Section 6.2.2.2, icp\_sal\_iommu\_map
- <u>Section 6.2.2.3, icp\_sal\_iommu\_unmap</u>

# 6.2.1.1 icp\_sal\_iommu\_get\_remap\_size

Returns the page\_size rounded for IOMMU remapping.



# 6.2.1.1.1 Syntax

size\_ticp\_sal\_iommu\_get\_remap\_size(size\_t size);

#### 6.2.1.1.2 Parameters

 $size_t$  the minimum required page size.

#### 6.2.1.1.3 Return Value

The icp\_sal\_iommu\_get\_remap\_size function returns the page\_size rounded for IOMMU
remapping.

#### 6.2.1.2 icp\_sal\_iommu\_map

Adds an entry to the IOMMU remapping table.

# 6.2.1.2.1 Syntax

CpaStatus icp\_sal\_iommu\_map(Cpa64U phaddr, Cpa64U iova, size\_t size);

# 6.2.1.2.2 Parameters

phaddr Host physical address.

iova Guest physical address.

size of the remapped region.

#### 6.2.1.2.3 Return Value

The icp sal iommu map function returns one of the following codes:

6.2.1.2.3.1 Code Meaning

CPA STATUS SUCCESS Successful operation.

CPA STATUS FAIL Indicates a failure.

# 6.2.1.3 icp\_sal\_iommu\_unmap

Removes an entry from the IOMMU remapping table.

#### 6.2.1.3.1 Syntax

CpaStatus icp\_sal\_iommu\_unmap(Cpa64U iova, size\_t size);

# 6.2.1.3.2 Parameters

- iova Guest physical address to be removed.
- size Size of the remapped region.

# 6.2.1.3.3 Return Value

The icp sal iommu unmap function returns one of the following codes:

#### 6.2.1.3.3.1 Code Meaning

CPA STATUS SUCCESS Successful operation.

CPA STATUS FAIL Indicates a failure.

# 6.2.1.4 IOMMU Remapping Function Usage

These functions are required when the user wants to access an acceleration service from the Physical Function (PF) when SR-IOV is enabled in the driver. In this case, all I/O transactions from the device go through DMA remapping hardware. This hardware checks 1) if the transaction is legitimate and 2) what physical address the given I/O address needs to be translated to. If the I/O address is not in the transaction table, it fails with a DMA Read error shown as follows:

```
DRHD: handling fault status reg 3
DMAR:[DMA Read] Request device [02:01.2] fault addr <ADDR> DMAR:[fault reason 06] PTE Read access is not set
```

To make this work, the user must add a 1:1 mapping as follows:

```
    Get the size required for a buffer:
int size = icp_sal_iommu_get_remap_size(size_of_data);
    Allocate a buffer:
char *buff = malloc(size);
    Get a physical pointer to the buffer:
```

```
buff_phys_addr = virt_to_phys(buff);
4. Add a l:1 mapping to the IOMMU tables:
```

icp\_sal\_iommu\_map(buff\_phys\_addr, buff\_phys\_addr, size);

5. Use the buffer to send data to the Intel® QAT Endpoint.

6. Before freeing the buffer, remove the IOMMU table entry: icp\_sal\_iommu\_unmap(buff\_phys\_addr, size);

7. Free the buffer:
free (buff);

The IOMMU remapping functions can be used in all contexts that the Intel® QAT APIs can be used, that is, kernel and user space in a Physical Function (PF) Domain 0, as well as kernel and user space in a Virtual Machine (VM). In the case of VM, the APIs will do nothing. In the PF Domain 0 case, the APIs update the hardware IOMMU tables.

# 6.2.2 Polling Functions

These functions are intended for retrieving response messages that are on the rings and dispatching the associated callbacks.

```
All polling function definitions are in:
$ICP_ROOT/quickassist/lookaside/access_layer/include/icp_sal_poll.h
```



The polling functions include:

- Section 6.2.2.1, icp\_sal\_pollBank
- Section 6.2.2.2, icp\_sal\_pollAllBanks
- <u>Section 6.2.2.3, icp\_sal\_CyPollInstance</u>
- <u>Section 6.2.2.4, icp\_sal\_DcPollInstance</u>
- <u>Section 6.2.2.5, icp\_sal\_CyPollDpInstance</u>
- <u>Section 6.2.2.6, icp\_sal\_DcPollDpInstance</u>

# 6.2.2.1 icp\_sal\_pollBank

Poll all rings on the given Intel<sup>®</sup> QAT Endpoint on a given bank number to determine if any of the rings contain response messages from the Intel<sup>®</sup> QAT Endpoint. The <u>response\_quota</u> input parameter is per ring.

#### 6.2.2.1.1 Syntax

CpaStatus icp\_sal\_pollBank(Cpa32U accelId, Cpa32U bank\_number, Cpa32U
response\_quota);

### 6.2.2.1.2 Parameters

accelld the device number associated with the Intel® QAT Endpoint.

The valid range is 0 to the number of Intel® QAT Endpoint devices in the system.

bank\_number the number of the memory bank on the Intel® QAT Endpoint that will be polled for response messages. The valid range is 0 to 31.

response quota the maximum number of responses to take from the ring in one call.

#### 6.2.2.1.3 Return Value

The icp sal pollBank function returns one of the following codes:

# 6.2.2.1.3.1 Code Meaning

CPA\_STATUS\_SUCCESS Successfully polled a ring with data.

CPA\_STATUS\_RETRY There is no data on any ring on any bank or the banks are already being polled.

CPA\_STATUS\_FAIL Indicates a failure.

#### 6.2.2.2 icp\_sal\_pollAllBanks

Poll all banks on the given Intel<sup>®</sup> QAT Endpoint to determine if any of the rings contain response messages from the Intel<sup>®</sup> QAT Endpoint. The <u>response\_quota</u> input parameter is per ring.



# 6.2.2.2.1 Syntax

CpaStatus icp\_sal\_pollAllBanks(Cpa32U accelId, Cpa32U response\_quota);

## 6.2.2.2.2 Parameters

accelld the device number associated with the Intel<sup>®</sup> QAT Endpoint. The valid range is 0 to the number of Intel<sup>®</sup> QAT Endpoints in the system.

response\_quota the maximum number of responses to take from the ring in one call.

# 6.2.2.2.3 Return Value

The icp\_sal\_pollAllBanks function returns one of the following codes:

### 6.2.2.2.3.1 Code Meaning

CPA STATUS SUCCESS Successfully polled a ring with data.

CPA\_STATUS\_RETRY There is no data on any ring on any bank or the banks are already being polled.

CPA\_STATUS\_FAIL Indicates a failure.

# 6.2.2.3 icp\_sal\_CyPollInstance

Poll the Cryptographic (CY) logical instance associated with the instanceHandle to retrieve requests that are on response rings associated with that instance and dispatch the associated callbacks. The response\_quota input parameter is the maximum number of responses to process in one call.

**NOTE:** The icp\_sal\_CyPollInstance() function is used in conjunction with the CyXIsPolled parameter in the acceleration configuration file.

#### 6.2.2.3.1 Syntax

CpaStatus icp\_sal\_CyPollInstance(CpaInstanceHandle instanceHandle, Cpa32U
response\_quota);

#### 6.2.2.3.2 Parameters

instanceHandle the logical instance to poll for responses on the response ring.

response\_quota the maximum number of responses to take from the ring in one call. When set to 0, all responses are retrieved.

# 6.2.2.3.3 Return Value

The icp\_sal\_CyPollInstance function returns one of the following codes:

### 6.2.2.3.3.1 Code Meaning

CPA\_STATUS\_SUCCESS The function was successful.



CPA\_STATUS\_RETRY There are no responses on the rings associated with the specified logical instance.

NOTE: A ring is only polled if it contains data.

CPA STATUS FAIL Indicates a failure.

## 6.2.2.4 icp\_sal\_DcPollInstance

Poll the Data Compression (DC) logical instance associated with the instanceHandle to retrieve requests that are on response rings associated with that instance and dispatch the associated callbacks. The response\_quota input parameter is the maximum number of responses to process in one call.

#### 6.2.2.4.1 Syntax

CpaStatus icp\_sal\_DcPollInstance(CpaInstanceHandle instanceHandle, Cpa32U
response\_quota);

#### 6.2.2.4.2 Parameters

instanceHandle the logical instance to poll for responses on the response ring.

response\_quota the maximum number of responses to take from the ring in one call. When set to 0, all responses are retrieved.

#### 6.2.2.4.3 Return Value

The icp sal DcPollInstance function returns one of the following codes:

#### 6.2.2.4.3.1 Code Meaning

CPA STATUS SUCCESS The function was successful.

CPA\_STATUS\_RETRY There are no responses on the rings associated with the specified logical instance.

NOTE: A ring is only polled if it contains data.

CPA STATUS FAIL Indicates a failure.

# 6.2.2.5 icp\_sal\_CyPollDpInstance

Poll a particular Cryptographic (CY) data path logical instance associated with the instanceHandle to retrieve requests that are on the high-priority symmetric ring associated with that instance and dispatch the associated callbacks. The response\_quota input parameter is the maximum number of responses to process in one call.

**NOTE:** The icp\_sal\_DcPollInstance() function is used in conjunction with the DcXIsPolled parameter in the acceleration configuration file.



## 6.2.2.5.1 Syntax

**NOTE:** This function is a Data Plane API function and consequently the restrictions in Section 6.1.2.2, "Usage Constraints on the Data Plane APIs" apply.

CpaStatus icp\_sal\_CyPollDpInstance(CpaInstanceHandle instanceHandle, Cpa32U response quota);

#### 6.2.2.5.2 Parameters

instanceHandle the logical instance to poll for responses on the response ring.

response\_quota the maximum number of responses to take from the ring in one call. When set to 0, all responses are retrieved.

#### 6.2.2.5.3 Return Value

The icp\_sal\_CyPollDpInstance () function returns one of the following codes:

#### 6.2.2.5.3.1 Code Meaning

CPA\_STATUS\_SUCCESS The function was successful.

CPA\_STATUS\_RETRY There are no responses on the rings associated with the specified logical instance.

CPA\_STATUS\_FAIL Indicates a failure.

# 6.2.2.6 icp\_sal\_DcPollDpInstance

Poll a particular Data Compression (DC) data path logical instance associated with the instanceHandle to retrieve requests that are on the response ring associated with that instance. The response\_quota input parameter is the maximum number of responses to process in one call.

#### 6.2.2.6.1 Syntax

**NOTE:** This function is a Data Plane API function and consequently the restrictions in Section 6.1.2.2 apply.

CpaStatus icp\_sal\_DcPollDpInstance(CpaInstanceHandle instanceHandle, Cpa32U response\_quota);

#### 6.2.2.6.2 Parameters

instanceHandle the logical instance to poll for responses on the response ring.

response\_quota the maximum number of responses to take from the ring in one call. When set to 0, all responses are retrieved.



## 6.2.2.6.3 Return Value

 $\label{eq:constance} The \verb"icp" sal_DcPollDpInstance" function returns one of the following codes:$ 

6.2.2.6.3.1 Code Meaning

CPA\_STATUS\_SUCCESS The function was successful.

CPA\_STATUS\_RETRY There are no responses on the rings associated with the specified logical instance.

CPA\_STATUS\_FAIL Indicates a failure.

# 6.2.3 User Space Access Configuration Functions

Functions that allow the configuration of user space access to the Intel<sup>®</sup> QAT services from processes running in user space.

All user space access configuration function definitions are located in *SICP\_ROOT*/ quickassist/lookaside/access layer/include/icp sal user.h.

The user space access configuration functions include:

- <u>Section 6.2.4.1, icp\_sal\_userStart</u>
- <u>Section 6.2.4.2, icp\_sal\_userStop</u>

# 6.2.3.1 icp\_sal\_userStart

Initializes user space access to an Intel<sup>®</sup> QAT Endpoint and starts in the pProcessName section in the given section of the configuration file. This function needs to be called before to any call to Intel<sup>®</sup> QAT API function from the user space process. This function is typically called only once in a user space process.

**NOTE:** The icp\_sal\_userStartMultiProcess() function is still supported, but the parameter limitDevAccess is ignored because its value is set once in the configuration file and is not allowed to be specified again in the function.

The configuration format allows the user to create a configuration for many user spaces processes. The driver internally generates unique process names and a valid configuration for each process based on the section name (pSectionName) and mode (limitDevAccess) provided.

For example, on a system with M number of devices, if all M configuration files contain: [IPSec]

NumProcesses = N LimitDevAccess = 0

Then, N internal sections are generated (each with instances on all devices) and N processes can be started at any given time. Each process can call

icp\_sal\_userStart("IPSec") and the driver determines the unique name to use for each
process.

Similarly, on an M device system, if all M configuration files contain:

```
[SSL]
NumProcesses = N LimitDevAccess=1
```

Then, M\*N internal sections are generated (each with instances on one device only) and M\*N processes can be started at any given time. Each process can call icp\_sal\_userStart("SSL") and the driver determines the unique name to use for each process.

Refer to <u>Section 4.5 Configuring Multiple Processes on a System with Multiple Intel® QAT</u> Endpoints for a detailed example.

# 6.2.3.1.1 Syntax

CpaStatus icp\_sal\_userStart(const char \*pSectionName);

#### 6.2.3.1.2 Parameters

\*pSectionName The section name described in the simplified configuration file format.

limitDevAccess Deprecated/ignored.

# 6.2.3.1.3 Return Value

The icp sal userStart function returns one of the following codes:

#### 6.2.3.1.3.1 Code Meaning

CPA\_STATUS\_SUCCESS Successfully started user space access to the Intel® QAT Endpoint as defined in the configuration file.

CPA STATUS FAIL Operation failed.

# 6.2.3.2 icp\_sal\_userStop

Closes user space access to the Intel<sup>®</sup> QAT Endpoint; stops the services that were running and frees the allocated resources. After a successful call to this function, user space access to the Intel<sup>®</sup> QAT Endpoint from a calling process is not possible. This function should be called once when the process is finished using the Intel<sup>®</sup> QAT Endpoint and does not intend to use it again.

# 6.2.3.2.1 Syntax

```
CpaStatus icp sal userStop( void);
```

#### 6.2.3.2.2 Parameters

None

# 6.2.3.2.3 Return Value

The icp sal userStop function returns one of the following codes:



#### 6.2.3.2.3.1 Code Meaning

CPA STATUS SUCCESS Successfully stopped user space access to the Intel® QAT Endpoint.

CPA\_STATUS\_FAIL Operation failed.

# 6.2.4 Version Information Function

A function that allows the retrieval of version information related to the software and hardware being used.

The version information function definition is located in: \$ICP\_ROOT/quickassist/ lookaside/access\_layer/include/icp\_sal\_versions.h.

There is only one version information function, that is, icp sal getDevVersionInfo.

# 6.2.4.1 icp\_sal\_getDevVersionInfo

Retrieves the hardware revision and information on the version of the software components being run on a given device.

**NOTE:** The icp\_sal\_userStartMultiProcess (or icp\_sal\_userStart) function must be called before calling this function. If not, calling this function returns CPA\_STATUS\_INVALID\_PARAMindicating an error. The icp\_sal\_userStartMultiProcess (or icp\_sal\_userStart) function is responsible for setting up the ADF user space component, which is required for this function to operate successfully.

#### 6.2.4.1.1 Syntax

CpaStatus icp\_sal\_getDevVersionInfo(Cpa32U devId, icp sal dev version info t \*pVerInfo);

#### 6.2.4.1.2 Parameters

devId the ID (number) of the device for which version information is to be retrieved

\*pVerInfo A pointer to a structure that holds the version information.

# 6.2.4.1.3 Return Values

 $The \verb"icp_sal_getDevVersionInfo" function returns one of the following codes:$ 

#### 6.2.4.1.3.1 Code Meaning

CPA STATUS SUCCESS Operation finished successfully; version information retrieved.

CPA\_STATUS\_INVALID\_PARAM Invalid parameter passed to the function.

CPA STATUS RESOURCE System resource problem.

CPA STATUS FAIL Operation failed.



# 6.2.5 Reset Device Function

This API can only be called in user-space.

The device can be reset using this API call. This API call schedules a reset of the device. The device can also be reset using the adf ctl utility, e.g., by calling adf ctl qat dev0 reset.

# 6.2.5.1 icp\_sal\_reset\_device

Resets the device.

6.2.5.1.1 Syntax

CpaStatus icp sal reset device(Cpa32U accelid);

# 6.2.5.1.2 Parameters

accelid the device number.

# 6.2.5.1.3 Return Value

The icp\_sal\_reset\_device function returns one of the following codes:

Code Meaning

CPA STATUS SUCCESS Successful operation.

CPA STATUS FAIL Indicates a failure.

# 6.2.6 Thread-Less APIs

These APIs can be used in the user space application.

The thread-less API functions include:

- <u>Section 6.2.6.1, icp\_sal\_poll\_device\_events</u>
- <u>Section 6.2.6.2, icp\_sal\_find\_new\_devices</u>

# 6.2.6.1 icp\_sal\_poll\_device\_events

This reads any pending device events from  $icp\_dev\&d\_csr$  and forwards to interested subsystems.

# 6.2.6.1.1 Syntax

CpaStatus icp\_sal\_poll\_device\_events(void);

# 6.2.6.1.2 Parameters

None



# 6.2.6.1.3 Return Value

The icp sal poll device events function returns one of the following codes:

6.2.6.1.3.1 Code Meaning

 $\label{eq:cpa_status_successful operation.} CPA\_STATUS\_SUCCESS$  Successful operation.

CPA STATUS FAIL Indicates a failure.

# 6.2.6.2 icp\_sal\_find\_new\_devices

This tries to connect to any available devices that the kernel driver has brought up and initialized for use in user space process.

#### 6.2.6.2.1 Syntax

CpaStatus icp\_sal\_find\_new\_devices(void);

#### 6.2.6.2.2 Parameters

None

#### 6.2.6.2.3 Return Value

The icp sal find new devices function returns one of the following codes:

6.2.6.2.3.1 Code Meaning

CPA\_STATUS\_SUCCESS Successful operation.

CPA\_STATUS\_FAIL Indicates a failure.

# 6.2.7 Compress and Verify (CnV) Related APIs

These APIs can be used in the user space application.

The CnV API functions include:

- Section 6.2.8.1, icp\_sal\_dc\_get\_dc\_error()
- <u>Section 6.2.8.2, icp\_sal\_dc\_simulate\_error()</u>

# 6.2.7.1 icp\_sal\_dc\_get\_dc\_error()

This API allows the application to return the number of errors that occurred a particular number of times during the lifetime of a process.

#### 6.2.7.1.1 Syntax

Cpa64U icp\_sal\_get\_dc\_error(Cpa8S dcError);

# 6.2.7.1.2 Parameters

Compression Error code exposed by CpaDcReqStatus enum in cpa dc.h

# 6.2.7.1.3 Return Value

The  $icp_sal_get_dc_error$  () API returns a 64 bit unsigned integer representing how many times the error type specified by Cpa8S dcError occurred in the current process.

# 6.2.7.2 icp\_sal\_dc\_simulate\_error()

This API injects a simulated compression error for a defined number of compression or decompression requests. The simulated compression errors can only be applied to the traditional APIs. It must be called prior to the APIs that perform the request.

In the case of a simulated Compress and Verify error for a single request, the application would call icp\_sal\_dc\_simulate\_error() API as such: icp\_sal\_dc\_simulate\_error(1, CPA\_DC\_VERIFY\_ERROR);

Followed by a call to:

CpaDcCompressData() or CpaDcCompressData2().

To use this API, the driver must be configured and compiled with option --enabledc-error-simulation.

# 6.2.7.2.1 Syntax

CpaStatus icp\_sal\_dc\_simulate\_error(Cpa8U numErrors, Cpa8S dcError);

# 6.2.7.2.2 Parameters

Cpa8U numErrors Number of simulated compression or decompression errors desired.

Cpa8S dcError Desired error code to be returned by the compression or decompression API.

# 6.2.7.2.3 Return Value

The icp sal dc simulate error API returns one of the following codes:

#### 6.2.7.2.3.1 Code Meaning

CPA STATUS SUCCESS Successful operation.

CPA STATUS FAIL Indicates that an invalid error type was assigned to dcError parameter.

# 6.2.8 Heartbeat APIs

These APIs check firmware/hardware status for a given device and are used as part of the Heartbeat functionality.



The Heartbeat API functions include:

- <u>Section 6.2.8.1, icp\_sal\_check\_device()</u>
- Section 6.2.8.2, icp\_sal\_check\_all\_devices()
- Section 6.2.8.3, icp\_sal\_heartbeat\_simulate\_failure()

## 6.2.8.1 icp\_sal\_check\_device()

This function checks the status of the firmware/hardware for a given device and is used as part of the Heartbeat functionality.

## 6.2.8.1.1 Syntax

CpaStatus icp\_sal\_check\_device(Cpa32U accelID);

#### 6.2.8.1.2 Parameters

 $\operatorname{accelid}$  the device ID.

# 6.2.8.1.3 Return Value

The icp\_sal\_check\_device function returns one of the following codes:

# 6.2.8.1.3.1 Code Meaning

CPA STATUS SUCCESS Successful operation.

CPA STATUS FAIL Indicates a failure.

## 6.2.8.2 icp\_sal\_check\_all\_devices()

This function checks the status of the firmware/hardware for all devices and is used as part of the Heartbeat functionality.

#### 6.2.8.2.1 Syntax

CpaStatus icp\_sal\_check\_all\_devices(void);

#### 6.2.8.2.2 Parameters

None

#### 6.2.8.2.3 Return Value

The icp sal check all devices function returns one of the following codes:

# 6.2.8.2.3.1 Code Meaning

CPA STATUS SUCCESS Successful operation.

CPA STATUS FAIL Indicates a failure.



# 6.2.8.3 icp\_sal\_heartbeat\_simulate\_failure()

This function simulates Heartbeat failure for a specific device.

# 6.2.8.3.1 Syntax

CpaStatus icp\_sal\_heartbeat\_simulate\_failure(Cpa32U accelID);

#### 6.2.8.3.2 Parameters

Accelid -- the device ID.

# 6.2.8.3.3 Return Value

The icp sal heartbeat simulate failure function returns one of the following codes:

# 6.2.8.3.3.1 Code Meaning

 $\label{eq:cpa_status_success} {\sf Successful operation}.$ 

CPA\_STATUS\_FAIL Indicates a failure.

# 6.2.9 Device Polling APIs

# 6.2.9.1 icp\_sal\_poll\_device\_events()

This function polls for device reset events.

### 6.2.9.1.1 Syntax

CpaStatus icp\_sal\_poll\_device\_events(void);

# 6.2.9.1.2 Parameters

None

#### 6.2.9.1.3 Return Value

 $The \verb"icp_sal_poll_device_events" function returns one of the following codes:$ 

# 6.2.9.1.3.1 Code Meaning

 $\label{eq:cpa_status_success} Successful operation.$ 

CPA STATUS FAIL Indicates a failure.

**NOTE:** The events are sent to each instance that has registered a callback function. The callbacks are registered using cpaCyInstanceSetNotificationCb and cpaDcInstanceSetNotificationCb.



# 6.2.9.2 cpaCyInstanceSetNotificationCb

Cryptographic instances use this function to register for device event notifications.

# 6.2.9.2.1 Syntax

```
CpaStatus cpaCyInstanceSetNotificationCb
const CpaInstanceHandle instanceHandle,
const CpaCyInstanceNotificationCbFunc
pinstanceNotificationCb,
void *pCallbackTag);
```

# 6.2.9.2.2 Parameters

instanceHandle Instance handle.

pinstanceNotificationCb Instance notification callback function pointer.

pCallbackTag Opaque value provided by user.

# 6.2.9.2.3 Return Values

The cpaCyInstanceSetNotificationCb() function returns one of the following codes:

#### 6.2.9.2.3.1 Code Meaning

CPA STATUS SUCCESS The function was successful.

CPA STATUS FAIL Indicates a failure.

CPA STATUS INVALID PARAM Invalid parameter passed in.

CPA STATUS UNSUPPORTED Function is not supported.

#### The signature for the callback function is:

```
typedef void (*CpaCyInstanceNotificationCbFunc)(
    const CpaInstanceHandle instanceHandle,
    void * pCallbackTag,
    const CpaInstanceEvent instanceEvent);
```

#### 6.2.9.2.4 Parameter

```
typedef enum _CpaInstanceEvent
{
    CPA_INSTANCE_EVENT_RESTARTING = 0,
    CPA_INSTANCE_EVENT_RESTARTED,
    CPA_INSTANCE_EVENT_FATAL_ERROR
} CpaInstanceEvent;
```

# 6.2.9.3 cpaDcInstanceSetNotificationCb

Cryptographic instances use this function to register for device event notifications.

# 6.2.9.3.1 Syntax

# 6.2.9.3.2 Parameters

instanceHandle Instance handle.

pinstanceNotificationCb Instance notification callback function pointer.

pCallbackTag Opaque value provided by user.

# 6.2.9.3.3 Return Values

The cpaDcInstanceSetNotificationCb() function returns one of the following codes:

#### 6.2.9.3.3.1 Code Meaning

CPA\_STATUS\_SUCCESS The function was successful.

CPA STATUS FAIL Indicates a failure.

CPA STATUS INVALID PARAM Invalid parameter passed in.

CPA STATUS UNSUPPORTED Function is not supported.

#### The signature for the callback function is:

# 6.2.9.3.4 Parameter

```
typedef enum _CpaInstanceEvent
{
    CPA_INSTANCE_EVENT_RESTARTING = 0,
    CPA_INSTANCE_EVENT_RESTARTED,
    CPA_INSTANCE_EVENT_FATAL_ERROR
} CpaInstanceEvent;
```

# 6.2.10 Congestion Management APIs

Congestion Management or Back-pressure mechanism APIs are intended to handle the cases when the device is busy. These APIs ensures there is enough space on the ring before submitting a request.

Applications can query the appropriate ring on each instance and select any instance with enough space without creating any OpData structures.

All these API definitions are located in: \$ICP\_ROOT/quickassist/lookaside/access\_layer/ include/icp\_sal\_congestion\_mgmt.h.



The Congestion Management APIs include:

- <u>Section 6.2.10.1, icp\_sal\_SymGetInflightRequests</u>
- <u>Section 6.2.10.2, icp\_sal\_AsymGetInflightRequests</u>
- Section 6.2.10.3, icp\_sal\_dp\_SymGetInflightRequests

# 6.2.10.1 icp\_sal\_SymGetInflightRequests

This function is used to fetch in-flight and max in-flight request counts for the given symmetric instance handle.

#### 6.2.10.1.1 Syntax

```
CpaStatus icp_sal_SymGetInflightRequests(CpaInstanceHandle
instanceHandle,
```

Cpa32U \*maxInflightRequests,

Cpa32U \*numInflightRequests)

#### 6.2.10.1.2 Parameters

instanceHandle Symmetric instance handle.

\*maxInflightRequests A pointer to the max in-flight request count.

\*numInflightRequests A pointer to the current in-flight request count.

# 6.2.10.1.3 Return Value

The icp sal SymGetInflightRequests function returns one of the following codes:

#### 6.2.10.1.3.1 Code Meaning

CPA STATUS SUCCESS Successfully retrieved the request counts.

CPA STATUS FAIL Indicates a failure.

CPA STATUS INVALID PARAM Invalid parameter.

# 6.2.10.2 icp\_sal\_AsymGetInflightRequests

This function is used to fetch in-flight and max in-flight request counts for the given asymmetric instance handle.

# 6.2.10.2.1 Syntax

```
CpaStatus icp_sal_AsymGetInflightRequests(CpaInstanceHandle
instanceHandle,
Cpa32U *maxInflightRequests,
Cpa32U *numInflightRequests)
```

# 6.2.10.2.2 Parameters

instanceHandle Asymmetric instance handle.

\*maxInflightRequests A pointer to the max in-flight request count.

\*numInflightRequests A pointer to the current in-flight request count.

# 6.2.10.2.3 Return Value

The icp sal AsymGetInflightRequests function returns one of the following codes:

6.2.10.2.3.1 Code Meaning

CPA STATUS SUCCESS Successfully retrieved the request counts.

CPA\_STATUS\_FAIL Indicates a failure.

CPA\_STATUS\_INVALID\_PARAM Invalid parameter.

# 6.2.10.3 icp\_sal\_dp\_SymGetInflightRequests

This data plane function is used to fetch in-flight and max in-flight request counts for the given symmetric instance handle.

# 6.2.10.3.1 Syntax

```
CpaStatus icp_sal_dp_SymGetInflightRequests(CpaInstanceHandle
instanceHandle,
```

Cpa32U \*maxInflightRequests,

Cpa32U \*numInflightRequests)

# 6.2.10.3.2 Parameters

instanceHandle Symmetric instance handle.

\*maxInflightRequests A pointer to the max in-flight request count.

\*numInflightRequests A pointer to the current in-flight request count.

#### 6.2.10.3.3 Return Value

The icp sal dp SymGetInflightRequests function returns one of the following codes:

#### 6.2.10.3.3.1 Code Meaning

CPA STATUS SUCCESS Successfully retrieved the request counts.

CPA\_STATUS\_FAIL Indicates a failure.

CPA\_STATUS\_INVALID\_PARAM Invalid parameter.



### 6.2.11 Service Specific Polling APIs

These service specific polling APIs are intended for retrieving response messages that are on the specific ring and dispatching the associated callback.

All these API definitions are located in: \$ICP\_ROOT/quickassist/lookaside/access\_layer/ include/ icp\_sal\_poll.h.

The Polling APIs include:

- <u>Section 6.2.11.1, icp\_sal\_CyPollSymRing</u>
- Section 6.2.11.2, icp\_sal\_CyPollAsymRing

### 6.2.11.1 icp\_sal\_CyPollSymRing

Poll the symmetric logical instance associated with the instanceHandle to retrieve requests that are on the response rings associated with that instance and dispatch the associated callbacks. The response\_quota input parameter is the maximum number of responses to process in one call.

#### 6.2.11.1.1 Syntax

CpaStatus icp\_sal\_CyPollSymRing(CpaInstanceHandle instanceHandle, Cpa32U response\_quota)

#### 6.2.11.1.2 Parameters

instanceHandle Instance handle to poll for responses on the response ring.

response\_quota the maximum number of messages that will be read in one polling. Setting the response quota to zero means that all messages on the ring will be read.

#### 6.2.11.1.3 Return Value

The icp sal CyPollSymRing function returns one of the following codes:

#### 6.2.11.1.3.1 Code Meaning

CPA STATUS SUCCESS Successfully polled a ring with data.

CPA STATUS RETRY There are no responses on the rings associated with the instance.

CPA\_STATUS\_FAIL Indicates a failure.

CPA\_STATUS\_INVALID\_PARAM Invalid parameter passed.

CPA STATUS RESTARTING Device restarting. Resubmit the request.

### 6.2.11.2 icp\_sal\_CyPollAsymRing

Poll the asymmetric logical instance associated with the instanceHandle to retrieve requests that are on the response rings associated with that instance and dispatch the associated

## intel

callbacks. The response\_quota input parameter is the maximum number of responses to process in one call.

### 6.2.11.2.1 Syntax

CpaStatus icp\_sal\_CyPollAsymRing(CpaInstanceHandle instanceHandle, Cpa32U response\_quota)

### 6.2.11.2.2 Parameters

instanceHandle Instance handle.

response\_quota the maximum number of messages that will be read in one poll. Setting the response quota to zero means that all messages on the ring will be read.

#### 6.2.11.2.3 Return Value

The icp sal CyPollAsymRing function returns one of the following codes:

#### 6.2.11.2.3.1 Code Meaning

CPA STATUS SUCCESS Successfully polled a ring with data.

CPA STATUS RETRY There are no responses on the rings associated with this instance.

CPA STATUS FAIL Indicates a failure.

CPA STATUS INVALID PARAM Invalid parameter passed.

CPA STATUS RESTARTING Device restarting. Resubmit the request.

### 6.2.12 Check Device Availability APIs

### 6.2.12.1 icp\_sal\_userIsQatAvailable

This API allows an application to establish if there is any active QAT device present on system, without calling internal libadf APIs or without a dependency on  $icp_sal_userStart()$ 

#### 6.2.12.1.1 Syntax

CpaBoolean icp\_sal\_userIsQatAvailable(void);

#### 6.2.12.1.2 Parameters

None

### 6.2.12.1.3 Return Value

The icp sal userIsQatAvailable API returns one of the following codes:



### 6.2.12.1.3.1 Code Meaning

CPA\_TRUE Indicates that there is at least one active device

CPA\_FALSE Indicates that there are no active devices

§



## 7 Application Usage Guidelines

This chapter provides useful guidelines and identifies some of the applications to which the platforms described in this manual are ideally suited.

## 7.1 Mapping Service Instances to Engines on the Intel<sup>®</sup> QAT Endpoint

A processor may be connected to one or more Intel<sup>®</sup> QAT Endpoints. For example, an Intel<sup>®</sup> Atom<sup>®</sup> C3000 Processor contains a single integrated Intel<sup>®</sup> QAT Endpoint, while a single Intel<sup>®</sup> C620 Series Chipset contains up to three Intel<sup>®</sup> QAT Endpoints.

Communication between software running on the processor and the Intel<sup>®</sup> QAT Endpoint is via hardware-assisted rings. Rings are used in pairs; software writes requests onto a request ring and reads responses back from a response ring. The Intel<sup>®</sup> QAT Endpoint load balances requests from all rings of a given service type across all available hardware "engines" of the corresponding type.

A set of 16 ring banks provides the communication mechanism between a processor and the acceleration complex. Each ring bank contains 16 individual rings for communication.

Intel<sup>®</sup> provides a software package that abstracts the communication between the host and the rings and presents the high-level Intel<sup>®</sup> QAT APIs.

## 7.1.1 Processor and Intel® QAT Endpoint Communication

An acceleration service uses different rings for request and response messages. Communication between the processor and Intel<sup>®</sup> QAT Endpoint is achieved using the following operations:

- The processor uses a write (PUT) operation to place a request on the request ring.
- The Intel® QAT Endpoint uses a read (GET) operation to retrieve the request from the request ring.
- Once the operation has been performed, the Intel<sup>®</sup> QAT Endpoint uses a write (PUT) operation to put the response to the response ring.
- The processor uses a read (GET) operation to retrieve the response from the response ring.

### 7.1.2 Service Instances and Interaction with the Hardware

A ring bank supports two crypto instances and two compression instances. A service instance can be thought of as a channel between an Intel<sup>®</sup> QAT Endpoint and a core/ thread running on the processor, which uses the rings for communication. The rings are not exposed by an API but are set up using configuration files (one for each Intel<sup>®</sup> QAT Endpoint).



In general, a service instance uses a pair of rings, one for requests and one for responses. For cryptographic instances, separate request/response pairs are used.

### 7.1.3 Service Instance Configuration

The configuration of a service instance is done in the configuration file.

The following figure shows an example extract of the relevant section in the configuration file.

#### Figure 7. Service Instance Configuration

| ######################################              |
|-----------------------------------------------------|
| ######################################              |
| NumberCyInstances = 1                               |
| NumberDcInstances = 0                               |
| # Crypto - user space instance #0 Cy0Name "proc0_0" |
| CyOIsPolled = 1                                     |
| Cy0CoreAffinity = 0                                 |

In the previous figure, the meaning of each numbered item is explained as follows:

- Each named address domain (one domain for the kernel, any number of user space process domains) has its own service instances.
- Specifies a name for the instance.
- Specifies that the instance is using polling.
- Specifies the core affinity for the instance.

### 7.1.4 Cryptographic Load Balancing Using Multiple Intel® QAT Instances

The application is responsible for load balancing/spreading requests across Intel® QAT Endpoints. Load balancing across the engines computing instances within the Intel® QAT Endpoint is performed by hardware.

In general, the device can be fully utilized from a single instance/ring pair. The main reasons for using multiple instances/ring pairs are:

- Separate software processes each benefit by having their own ring pair to enable the rings to be mapped into the address space of that process.
- Separate threads within a process, possibly on different cores, avoid contention.
- If using interrupts, they can be affinitized from different instances/ring pairs to different cores.



## 7.2 Cryptography Applications

Cryptography applications supported by the platforms described in this manual include, but are not limited to:

- Virtual Private Networks (VPNs, both IPsec and SSL). Both symmetric and public key cryptography can be offloaded for bulk transfer and key exchange (IKE, SSL handshakes and so on). Refer to <u>Section 7.2.1, IPsec and SSL VPNs</u> for more information.
- Encrypted Storage. See <u>Section 7.2.2, Encrypted Storage</u> for more information.
- Web Proxy Appliances. See Section 7.2.3, Web Proxy Appliances.

### 7.2.1 IPsec and SSL VPNs

Virtual Private Networks (VPNs) allow for private networks to be established over the public Internet by providing confidentiality, integrity, and authentication using cryptography. VPN functionality can be provided by a standalone security gateway box at the boundary between the trusted and untrusted networks. It is also commonly combined with other networking and security functionality in a security appliance, or even in standard routers.

VPNs are typically based on one of two cryptographic protocols, either IPsec or Datagram Transport Layer Security (DTLS). Each has its advantages and disadvantages.

One of the most compute-intensive aspects of a VPN is the cryptographic processing required to encrypt/decrypt traffic for confidentiality, to perform cryptographic hash functionality for authentication, and to perform public key cryptography, based on modular exponentiation of large numbers or elliptic curve cryptography as part of key negotiation and exchange. The PCH provides cryptographic acceleration that can offload this computation from the CPU, thereby freeing up CPU cycles to perform other networking, encryption, or other value-add applications.

The Intel<sup>®</sup> QAT Endpoint offers its acceleration services through an API, called the Intel<sup>®</sup> QAT Cryptographic API. This can be invoked from the Linux\* kernel or from Linux\* user space as well as from other operating systems. Intel<sup>®</sup> also provides plugins to enable many of the PCH's cryptographic services to be accessed through open-source cryptographic frameworks, such as the Linux\* Kernel Crypto Framework/API (also known as the scatterlist API) and OpenSSL\* libcrypto\* (through its EVP API). This facilitates ease of integration with certain open-source implementations of protocol stacks, such as the Linux\* kernel's native IPsec stack (called NETKEY) or with OpenVPN\* (an open source SSL VPN implementation).

### 7.2.2 Encrypted Storage

In recent years, cases of lost laptops containing sensitive information have made the headlines all too frequently. Full disk encryption has become a standard procedure for many corporate PCs. Safe-guarding critical data, however, is not just a necessity in the client space, it is also a necessity in the data center.

Enterprise-class storage appliances achieve throughput rates in excess of 50 Gbps. Several high-profile cases of data theft have triggered updates to government regulations and industry standards. These regulations/standards now require protection of data-at-rest for applications



involving sensitive data such as medical and financial records, typically using strong encryption. The high computational cost of adding encryption to storage appliances makes offload solutions an attractive value proposition.

Several complimentary standards exist for the encryption of data-at-rest, which, when combined with traditional network security protocols, such as IPsec or SSL/Transport Layer Security(TLS), provide an end-to-end encrypted storage solution, even for data-in-flight.

The IEEE\* Security in Storage working group is developing the IEEE 1619 series of standards that deal with cipher algorithms for disk and tape storage devices (AES in CCM and GCM modes). The cryptographic acceleration services of platforms that use the Intel® QAT Endpoints are ideally suited for long-term encrypted storage solutions implementing the IEEE 1619.1 standard, by providing acceleration of the AES-256 cipher in CBC, CCM, and GCM modes and HMAC authentication using SHA-1, SHA-256 and SHA-512 hashes.

The Trusted Computing Group's (TCG) Storage Working Group does not prescribe a particular set of algorithms for the disk encryption. Instead, it defines several Storage Subsystem Classes (SSC) for various usage models, which define services such as enrollment and connection, protected storage (an extension of Trusted Platform Module (TPM)), locking, logging, cryptographic services, authorization, and firmware updates. The cryptographic acceleration services of the platform can help by providing the highest level of encryption for authenticating the host to trusted peripherals implementing the TCG storage standards.

### 7.2.3 Web Proxy Appliances

Historically, Web Proxy appliances have evolved to present a public or intermediary interface for clients seeking resources from other servers, providing services such as web page caching and load balancing. These appliances are located at the edge of the network, typically at network gateways. Due to their centralized presence in the network, Web Proxy appliances today (referred to with several different names, such as Application Delivery Controllers, Reverse Proxy, and so on) have become a collection of services that include:

- Application Load Balancing (L4-L7)
- SSL Acceleration
- Wide Area Network- (WAN) Acceleration
- Caching
- Traffic Management
- Web Application Firewall

SSL and WAN acceleration have become common place capabilities of the Web Proxy appliance, require computing intensive algorithms for cryptography (SSL) and compression (WAN acceleration). Intel® QAT devices on the platforms described in this manual provide acceleration of asymmetric cryptography (RSA is the most used key negotiation algorithm in SSL), symmetric cryptography (all algorithms defined in the TLS RFCs can be accelerated with the PCH) and compression (DEFLATE algorithm). With the prominence of Web Proxy appliances in typical networks, this use case has applications from cloud computing to small webserver deployments.



## 7.3 Data Compression Applications

Data compression can be used as part of application delivery networks, data de-duplication, as well as in several crypto applications, for example, VPNs, IDS/IPS and so on.

### 7.3.1 Compression for Storage

In a time when the amount of online information is increasing dramatically, but budgets for storing that information remain static, compression technology is a powerful tool for improved information management, protection, and access.

Compression appliances can transparently compress data such that clients can keep between two- and five-times more data online and reap the benefit of other efficiencies throughout the data lifecycle. By shrinking the primary data, all subsequent copies of that data, such as backups, archives, snapshots, and replicas are also compressed. Compression is the newest advancement in storage efficiency. Storage compression appliances can shrink primary online data in real time, without performance degradation. Compression can significantly lower storage capital and operating expenses by reducing the amount of data that is stored, and the required hardware that must be powered and cooled.

Compression can help slow the growth of storage, reducing storage costs while simplifying both operations and management. It also enables organizations to keep more data available for use, as opposed to storing data offsite or on harder-to-access media (such as tape).

Compression algorithms are very compute-intensive, which is one of the reasons why the adoption of compression techniques in mainstream applications has been slow. As an example, the DEFLATE Algorithm, which is one of the most used and popular compression techniques today, involves several compute-intensive steps: string search and match, sort logic, binary tree generation, Huffman Code generation. Intel<sup>®</sup> QAT devices in the platforms described in this manual provide acceleration capabilities in hardware that allow the CPU to offload the compute-intensive DEFLATE algorithm operations, thereby freeing up CPU cycles for other networking, encryption, or other value-add operations.

### 7.3.2 Data Deduplication and WAN Acceleration

Data Deduplication and WAN Acceleration are coarse-grain data compression techniques centered around the concept of single-instance storage. Identical blocks of data (either to be stored on disk or to be transferred across a WAN link) are only stored/moved once, and any further occurrences are replaced by a reference to the first instance.

While the benefits of deduplication and WAN acceleration obviously depend on the type of data, multi-user collaborative environments are the most suitable due to the amount of naturally occurring replication caused by forwarded emails and multiple (similar) versions of documents in various stages of development.

Deduplication strategies can vary in terms of inline vs post-processing, block size granularity (file-level only, fixed block size or variable block-size chunking), duplicate identification (cryptographic hash only, simple CRC followed by byte-level comparison or hybrids) and duplicate look-up (for example, Bloom filter based index).



Cryptographic hashes are the most suitable techniques for reliably identifying matching blocks with an improbably low risk for false positives, but they also represent the most compute-intensive workload in the application. As such, the cryptographic acceleration services offered by the hardware through the Intel<sup>®</sup> QAT Cryptographic API can be used to improve the throughput of deduplication/WAN acceleration applications considerably.

Additionally, the compression/decompression acceleration services can be used to further compress blocks for storage on disk, while optionally encrypting the compressed contents.

§



## 8 Black Box Debug Tool

This chapter provides information on the configuration and use of the Intel<sup>®</sup> QuickAssist Technology Black Box Debug tool. Information contained includes usage examples, fail signature cases and sample outputs.

## 8.1 Introduction

### 8.1.1 Overview

QAT Debug tool was designed to add customer-usable debug solutions that can gather data in order to help with issue diagnosis. It is intended to help the customer to identify root-cause, in a relatively short time and avoid putting significant effort into the whole debugging process.

QAT library does not perform extensive checks or input data validation, which can cause device hangs and other unexpected behavior. Root-causes of these issues are hard to identify without advanced debugging techniques. Using this tool, the customer is given enough information to allow them to find and fix defects caused by probable QAT API misuse. This should be achieved without the QAT-specific technical knowledge required. The QAT Debug tool is released as part of a customer-deployed solution.

### 8.1.1.1 Security Considerations

QAT Debug tools main aim is to store data captured from traffic generated between the userspace and the QAT device. This useful function could also be considered a potential security risk, given written data could be sensitive in some cases.

Potential sensitive data that is stored by QAT Debug feature:

- Physical addresses of flat buffers (memory at these addresses could contain sensitive data)
- Content descriptor in SYM (symmetric) request that can contain sensitive data
- Members of the QAT group can access stored information contained in logs describing traffic generated by all users inside this group

Potential risk mitigations:

- Sensitive data cleanup function was implemented to be called right after capture
- Storage directories are restricted to QAT group only

### 8.1.1.2 Performance Considerations

Users should be aware that the QAT Debug feature is able to work in continuous sync mode, which can significantly decrease overall performance when configuration is set to collect all possible data. Types of storage also have a large impact on overall performance. It is recommended to use high-performance storage (high speed NAND memory, RAM-disk, etc.).



## 8.2 Detailed Description

### 8.2.1 Collection Data

This feature is intended to collect low-level traffic between the QAT driver and firmware. Collected data is stored in binary file - original form with additional metadata such as: timestamp, process ID of sender and basic slice configuration data (cipher, hash algorithms, data compression type, Huffman tree type).

In order to perform validation of physical addresses and buffer lengths alignments, the content of SGL is additionally captured.

Contents of OP Data can also be captured when the proper log-level is set.

Data collection is supported for:

- QAT driver 'Traditional API':
  - FW (firmware) request descriptors including SGL (Scatter-Gather List), SYM (Symmetric) and DC (Data Compression))
  - FW response descriptors
  - API calls (OP Data content provided by caller)
- QAT driver 'Data Plane API':
  - FW request descriptors including SGL (SYM, DC)
  - FW response descriptors
  - API calls (OP Data content provided by caller)

Data is stored in kernel-managed debug buffers. This approach has been chosen to meet the following requirements:

- The integrity of data must be preserved despite e.g., user-process crash
- Performance degradation should be insignificant
  - Any additional sys-call during request preparation or responses parsing can degrade performance significantly
- Debug buffers should be available either from user-space or kernel (to support e.g., QAT kernel API)

What is not supported:

- Kernel API
- LKCF (Linux\* Kernel Cryptography Framework)

### 8.2.1.1 Data Synchronization

QAT Debug can operate in two synchronization modes:

• Continuous data synchronization



• Event triggered synchronization (crash dump)

### 8.2.1.1.1 Continuous Synchronization

This optional feature is intended to perform an ongoing data synchronization with persistent storage. The data stored in debug buffers is dumped to continuous synchronization files immediately after receiving the following events from user-space application or kernel module:

- Debug buffer is released
- Debug buffer is full
- User-process crashes (event caught by kernel module)

*NOTE:* This mode is introducing additional performance degradation closely related to disk performance

### 8.2.1.1.2 Crash-Dump Mode

This option is intended to use only debug buffers while handling traffic and to dump contents of buffers to persistent storage, only if any of error events occur. When a buffer is full, it is replaced with an empty buffer or a buffer with less recent data.

### 8.2.1.1.3 Data Collecting Architecture

The high-level design for data collection is presented in the figure below:



#### Figure 8. Data Collection Architecture



### 8.2.1.2 Handling QAT Error Events

The following 'error events 'are supported:

- IRQ based event caught by QAT kernel driver
- AER (Advance Error Reporting) depending on platform configuration
- Firmware error response (including Slice hang)
- Slice hang caught as an IRQ
- Process crash event connected to 'orphan ring cleaner'

The handling of error events is presented in the diagram below:

### Figure 9. Typical Crash Dump Scenario



### 8.2.2 Post-Processing

This post-processing tool provides the following utilities:

Audits:

- Physical address used in FW request and SGLs
- Return codes in FW responses
- FLAT buffers and SGL buffers lengths based on cipher algorithm



Listings:

• Lists all collected entries sorted by sent/extraction time

Triggers:

• Manual trigger to dump content of debug buffers to configured location

### 8.2.2.1 Physical Addresses Audit

NOTE: Usage example is available in Section <u>8.5.2</u>

The audit is based on:

• Memory map regions - mapped to user space process collected now of 'error event'

NOTE: Huge pages are supported

- Buffers overlapping test
- Basic null checks

User space process memory map example:

```
[root@silpixa00400507 qat_logs]# cat
proc.mmaps.dev00_0000_4d_00_0 | grep 43927
43927:0x000001cd9c00000:4194304
43927:0x000001c8f400000:2097152
43927:0x0000001c8f200000:2097152
43927:0x0000001c8f000000:2097152
43927:0x0000001c8ee00000:2097152
43927:0x0000001c8ec00000:2097152
```

Command line interface:

```
#qat_dbg_report command=audit_phy_addresses path=<path>
  [dev=<dev>]|[bdf=<bdf>]
```

- Path: path to
  - Crash dump directory
  - o Continuous sync directory
- Dev: device ID
- Bdf: domain, bus, device, function of a device in a hexadecimal format of 0000:00:00.0

NOTE: Dev or bdf option required only in case of analyzing cont-sync data

### 8.2.2.2 Cipher Lengths Audit

NOTE: Usage example is available in Section <u>8.5.3</u>

The audit is based on:



- Cipher algorithm extracted from session and stored in 'content type' field
- Lengths of input buffers (flat buffers and in SGLs)

Command line interface:

#qat\_dbg\_report command=audit\_fields\_lengths path=<path>
 [dev=<dev>]|[bdf=<bdf>]

- Path: path to
  - Crash dump directory
  - o Continuous sync data directory
- Dev: device ID
- Bdf: domain, bus, device, function of a device in a hexadecimal format of 0000:00:00.0

NOTE: Dev or bdf option required only in case of analyzing cont-sync data

### 8.2.2.3 Return Codes Audit

#### NOTE: Usage example is available in Section <u>8.5.4</u>

The audit is based on:

• return codes in FW responses (sym/asym/dc)

Command line interface:

#qat\_dbg\_report command= audit\_ret\_codes path=<path>
 [dev=<dev>]|[bdf=<bdf>]

• path: path to

*Caution:* Crash dump directory

*Caution:* Continuous sync data directory

- dev: device ID
- bdf: domain, bus, device, function of a device in a hexadecimal format of 0000:00:00.0
- NOTE: Dev or bdf option required only in case of analyzing cont-sync data
- *NOTE:* In case of any error found audit tries to match response with corresponding request and prints both to output.

### 8.2.2.4 Listing Collected Data in 'Human Readable Form'

This option is intended to display the collected data in human readable form. It can be useful while investigating certain issues. Entries are sorted in descending order according to timestamp.

Command line interface:

## intel.

#qat\_dbg\_report command=list [path=<path>] [dev=<dev>]|[bdf=<bdf>]
[last=<last>]

### • Path: path to

Caution: Crash dump directory

Caution: Continuous sync data directory

- Dev: device ID
- bdf: is a domain, bus, device, function of a device in a hexadecimal format of 0000:00:00.0

NOTE: Dev or bdf option required only in case of analyzing cont-sync data

• Last: prints last several packets restricted by <last> entities

#### Example:

| Entry [REQUEST SYM]: Time-stamp: 2020-11-10 13:58:24.211579144                  |  |  |  |
|---------------------------------------------------------------------------------|--|--|--|
| Bank: 0 Ring: 2 [2] PID: 43784                                                  |  |  |  |
|                                                                                 |  |  |  |
| [0.1B] Crypto command ID:                                                       |  |  |  |
| ICP_QAT_FW_LA_CMD_CIPHER_HASH [2]                                               |  |  |  |
| [0.2B] Service type: ICP_QAT_FW_COMN_REQ_CPM_FW_LA [4]                          |  |  |  |
| [1.0-1B] LA BULK (SYMMETRIC CRYPTO) COMMAND FLAGS                               |  |  |  |
| (0x2c)                                                                          |  |  |  |
| [1.12] ZUC_3G_PROTO: 0                                                          |  |  |  |
| [1.11] GCM_IV_LEN_FLAG: 0                                                       |  |  |  |
| [1.10] DIGEST IN BUFFER: 0                                                      |  |  |  |
| [1.7-9] PROTO: 0                                                                |  |  |  |
| [1.6] CMP_AUTH: 0<br>[1.5] RET_AUTH: 1                                          |  |  |  |
| [1.5] RET_AUTH: 1                                                               |  |  |  |
| [1.4] UPDATE STATE: 0                                                           |  |  |  |
| [1.3] CIPH_AUTH_CFG_OFFSET_FLAG: 1                                              |  |  |  |
| [1.2] CIPH IV FLD FLAG: 1                                                       |  |  |  |
| [1.0-1] PARTIAL FLAGS: 0 (FULL)                                                 |  |  |  |
| [1.2B] Common Request flags: 0x1                                                |  |  |  |
| SGL[1] CD IN [0] BNP [0]                                                        |  |  |  |
| [1.3B] Extended Symmetric Crypto Command Flags: 0                               |  |  |  |
| [2-3] Content Descriptor (CD) Param Pointer:                                    |  |  |  |
| 0xe864e4540                                                                     |  |  |  |
| [4.2B] Content Descriptor Param Size: 8 [Quad words]                            |  |  |  |
| [6-7] Opaque Data: 0x7fcd7c2e2040                                               |  |  |  |
| [8-] Source phy addr. 0x08600100                                                |  |  |  |
| [8-9] Source phy_addr: 0xe864e4c00<br>[10-11] Destination phy_addr: 0xe864e4c00 |  |  |  |
| [12] Source length: 0                                                           |  |  |  |
| [12] Source rength: 0<br>[13] Destination length: 0                             |  |  |  |
|                                                                                 |  |  |  |
| [14-19] Cipher Request Parameters:                                              |  |  |  |
| [14] uint32_t::cipher_offset: 24<br>[15] uint32_t::cipher_length: 64            |  |  |  |
| [15] uint3_t::cipner_length: 64                                                 |  |  |  |
| [16-17] uint64_t::cipher_IV_ptr:                                                |  |  |  |
| 0xaddbcefabebafeca                                                              |  |  |  |
| [18-19] uint64_t::resrvd1: 0x459113d88f8cade                                    |  |  |  |
| [27-28.0B] Cipher Request Control Header:                                       |  |  |  |
| [27.0B] uint8_t::cipher_state_sz: 2                                             |  |  |  |
| [27.1B] uint8_t::cipher_key_sz: 2                                               |  |  |  |
| <pre>[27.2B] uint8_t::cipher_cfg_offset: 18</pre>                               |  |  |  |
| <pre>[27.3B] uint8_t::next_curr_id: 0x21 (curr_id:</pre>                        |  |  |  |
| 1, next: 2)                                                                     |  |  |  |



|             | <pre>[28.0B] uint8_t::cipher_padding_sz: 0</pre>                   |
|-------------|--------------------------------------------------------------------|
| [20-26]     | Authentication Request Parameters:                                 |
|             | <pre>[20] uint32_t::auth_off: 0</pre>                              |
|             | [21] uint32 t::auth len: 88                                        |
|             | [22-23] uint64 t::aad adr/APS: 0xe864e46e0                         |
|             | [24-25] uint64 t::auth res addr: 0xe864e5058                       |
|             | [26.0B] uint8 t::aad sz/inner prefix sz: 0                         |
|             | [26.1B] uint8 t::resrvd1: 0                                        |
|             | [26.2B] uint8 t::hash state sz: 0                                  |
|             | [26.3B] uint8 t::auth res sz: 0                                    |
| [27-31]     | Authentication Request Control Header:                             |
| [1, 01]     | [27] uint32 t::resrvd1: 0x21120202                                 |
|             | [28.0B] uint8 t::resrvd2: 0x0                                      |
|             | [28.1B] uint8 t::hash flags: 0x0                                   |
|             | [28.2B] uint8 t::hash cfg offset: 46                               |
|             | [28.3B] uint8 t::next curr id: 0x42 (curr id:                      |
| 2, next: 4) |                                                                    |
| 2, next. 4) | [29.0B] uint8 t::resrvd3: 0x0                                      |
|             | [29.1B] uint8 t::outer prefix offset: 0                            |
|             | [29.2B] uint8 t::final sz: 12                                      |
|             | [29.3B] uint8 t::inner res sz: 20                                  |
|             | [29.36] uinto_t::inner_res_sz: 20<br>[30.08] uint8 t::resrvd4: 0x0 |
|             |                                                                    |
|             | [30.1B] uint8_t::inner_state1_sz: 24                               |
|             | [30.2B] uint8_t::inner_state2_offset: 3                            |
|             | [30.3B] uint8_t::inner_state2_sz: 24                               |
|             | [31.0B] uint8_t::outer_config_offset: 0                            |
|             | [31.1B] uint8_t::outer_state1_sz: 0                                |
|             | [31.2B] uint8_t::outer_res_sz: 0                                   |
|             | [31.3B] uint8_t::outer_prefix_offset: 0                            |
| SGL Dat     |                                                                    |
|             | Source SGL contains 1 flat buffer(s):                              |
|             | <pre>[0] Flat buffer: len: 100 phy_addr:</pre>                     |
| e864e5000   |                                                                    |
|             | Destination SGL contains 1 flat buffer(s):                         |
|             | <pre>[0] Flat buffer: len: 100 phy_addr:</pre>                     |
| e864e5000   |                                                                    |

## 8.3 Installation

### 8.3.1 Hardware and Software Compatibility

### Hardware:

- LBG Intel<sup>®</sup> C62x Chipset
- Intel<sup>®</sup> Atom<sup>®</sup> C3000 processor product family
- Intel<sup>®</sup> QuickAssist Adapter 8960/Intel<sup>®</sup> QuickAssist Adapter 8970 (formerly known as 'Lewis Hill')
- Intel<sup>®</sup> Communications Chipset 8925 to 8955 Series
- Intel<sup>®</sup> C4xxx Series QAT

### Supported Operating System:

• As per 4.16 release of the 1.7 QAT Linux\* driver



### 8.3.2 Installing the Driver

**NOTE:** User must have root privileges to perform the following:

Step 1 - copy package onto the system

Step 2 - extract package: # mkdir /root/QAT # cd /root/QAT # tar -xzomf <path\_to>/QAT<hw\_version>.<sw\_version>.tar.gz

Step 3 - setup the environment to build driver:
# ./configure --enable-icp-qat-dbg

Step 4 - build and install driver:
# make install

**NOTE:** Successful build should end up with message similar to the following:

```
Checking status of all devices.

There is 3 QAT acceleration device(s) in the system:

qat_dev0 - type: c6xx, inst_id: 0, node_id: 0, bsf:

0000:3d:00.0, #accel: 5 #engines: 10 state: up

qat_dev1 - type: c6xx, inst_id: 1, node_id: 0, bsf:

0000:3f:00.0, #accel: 5 #engines: 10 state: up

qat_dev2 - type: c6xx, inst_id: 2, node_id: 1, bsf:

0000:da:00.0, #accel: 5 #engines: 10 state: up
```

Step 5 - change access permissions to kernel debug sysfs directory (for non-root usage):
# chmod o+rx /sys/kernel/debug

### 8.3.3 Compiling and Executing Performance Sample Code

Step1-build application:
# make sample-all

Step 2 - install sample applications:
# make sample-install

Step 3 - run sample code sanity check:
# cpa\_sample\_code signOfLife=1

**NOTE:** Tool execution should end with following message:

Sample code completed successfully.

### 8.3.4 Uninstalling the Driver

Step 1 - bring down the driver:
# adf ctl down

Step 2 - uninstall driver:



# cd /root/QAT/
# make uninstall

## 8.4 Configuration

### 8.4.1 Configuration via QAT Device Configuration Files

QAT Debug may be configured via dedicated section in QAT device configuration file. The listing below shows example configuration of the debug feature:

```
*****
# QAT Debuggability Section
# Debug levels description:
#
   0: no data collection
   1: API calls data collection
#
#
   2: FW calls data collection
  3: combined level 1 and 2
#
****
[DEBUG]
Enabled = 0
DebugLevel = 2
NumBuffers = 128
BufferSizeMB = 4
LogDir = "/qat crash"
DumpOnProcessCrash = 0
LogDirMaxSizeMB = 4096
ContSyncEnabled = 1
ContSyncLogDir = "/qat logs"
ContSyncMaxLogFiles = \overline{10}
ContSyncMaxLogSizeMB = 100
```

- *NOTE:* Package is installed with Debug section already added to configuration files but feature is disabled by default.
- **NOTE:** For Virtual Functions (VFs), the above [DEBUG] section must also exist in the configuration files, i.e.  $c4xxxvf\_dev0.conf, c4xxxvf\_dev1.conf, etc.$ , with Enabled = 1 | 2 | 3.

Field descriptions:

Enabled

[0]: Collecting data disabled

[1]: Collecting data enabled

DebugLevel

[0]: No data collecting[1]: Collecting API calls only[2]: Collecting FW requests and responses (default)[3]: Collecting all above

NumBuffers:

[50-2000]: Number of buffers for data storage per device

# intel

BufferSizeMB:

[2-4]: Size of each buffers in MB

LogDir:

["path"]: Path to directory for crash dumps

• LogDirMaxSizeMB:

[1024+]: Maximum size of crash dump directory

- 5. If there is no space for new crash dump the oldest crash dump directory present under 'LogDir' path is removed.
- DumpOnProcessCrash:

[0]: Do not dump buffers in case of user-space process connected to QAT crash

[1]: Dump buffers in case of user-space process connected to QAT crash

ContSyncEnabled:

[0]: Do not perform ongoing synchronization of collected data with persistent storage

- [1]: Perform ongoing synchronization of collected data with persistent storage
- *NOTE:* If continuous sync mode is enabled crash dumps are not performed while handling error events. In such case the post-processing analysis is performed only based on the data collected by continuous sync option. In this case the buffers number and their size must be configured properly to hold generated QAT payload in case of high throughput.
  - ContSyncLogDir:

["path"]: Path to directory for continuous sync data

ContSyncMaxLogFiles:

[10-100]: Maximum number of continuous sync files

• ContSyncMaxLogSizeMB:

[100-1000]: Maximum size in MB of particular continuous sync file

6. Please reload configuration after each change by using 'adf\_ctl restart'.

### 8.4.2 Configuration via sysfs

On Linux\*, the configuration via *sysfs* is very similar to configuration via QAT configuration files. Main difference is that the parameters are passed to files, which are accessible in /sys/kernel/debug/<device>/qat\_debug/ directory. QAT configuration files stored by default in /etc/ may stay unmodified. Reloading parameters follows writing to *enabled* file. Thus, all parameters should already be passed before writing to that file.

NOTE: This functionality is not available on FreeBSD.



One could define a script named sysfs cfg.sh to perform configuration in a clean manner:

```
#!/bin/bash
[[ $# -ne 1 ]] && echo "Error: Please provide sysfs subfolder" && exit
1;
dev=$1
echo "200" > /sys/kernel/debug/${dev}/qat debug/buffer pool size
echo "4" > /sys/kernel/debug/${dev}/qat debug/buffer size mb
echo "/qat logs" > /sys/kernel/debug/${dev}/qat debug/cont sync dir
echo "0" > /sys/kernel/debug/${dev}/qat_debug/cont_sync_enabled
echo "10" > /sys/kernel/debug/${dev}/gat debug/cont sync max files
echo "100" >
/sys/kernel/debug/${dev}/qat debug/cont sync max file size mb
echo "/qat crash" > /sys/kernel/debug/${dev}/qat debug/dump dir
echo "4096" > /sys/kernel/debug/${dev}/qat debug/dump dir size mb
echo "3" > /sys/kernel/debug/${dev}/qat_debug/level
echo "1" > /sys/kernel/debug/${dev}/qat_debug/dump_on_process_crash
#commit
echo "1" > /sys/kernel/debug/${dev}/qat debug/enabled
```

Example call to modify QAT 1.7 VF available under 0000:4d:02.0 bdf: sh sysfs cfg.sh qat c4xxxvf 0000:4d:02.0

The configuration should end with starting Debuggability daemon with a command:  ${\tt qat\_dbg\_daemon\_sync}$ 

### 8.4.3 Checking Current Configuration Used by Driver

To check details about current configuration used by driver, the following utility can be used:

• qat\_dbg\_ctl

Tool usage looks as follows:

```
USAGE:

# /usr/local/bin/qat_dbg_ctl start||stop||status||restart

To see QAT debuggability configuration in the system use:

/usr/local/bin/qat_dbg_ctl status

To start QAT debuggability synchronization daemon use:

/usr/local/bin/qat_dbg_ctl start

To terminate QAT debuggability synchronization daemon use:

/usr/local/bin/qat_dbg_ctl stop

To restart QAT debuggability synchronization daemon use:

/usr/local/bin/qat_dbg_ctl stop
```

Example (QAT Debug not configured):

```
# qat_dbg_ctl status
QAT debuggability configuration:
```



```
No QAT devices configured with debuggability QAT debuggability synchronization daemon not running.
```

Example (QAT Debug configured):

```
# qat dbg ctl status
QAT debuggability configuration:
        Device: qat c6xx 0000:3f:00.0
                Debug level: 3
                Buffer pool size: 100
                Buffer size in MB: 4
                Crash dump on client process: 0
                Synchronization mode: dump on crash
                Crash dump directory: /qat crash
                Crash dump directory max size in MB: 4096
        Device: qat c6xx 0000:3d:00.0
                Debug level: 3
                Buffer pool size: 128
                Buffer size in MB: 4
                Crash dump on client process: 0
                Synchronization mode: cont-sync
                Cont-sync directory: /qat logs
                Max number of cont-sync log files 10
                Max size of cont-sync log file: 100
QAT debuggability synchronization daemon running. Pid:
        31962
```

7. If feature is enabled - 'qat\_dbg\_sync\_daemon' should be up and running. Daemon is initialized automatically by adf\_ctl during configuration reloading.

QAT debug synchronization daemon (qat\_dbg\_sync\_daemon) logs to the syslog – you can check daemon activities e.g., by the following command:

```
# grep qat dbg sync daemon /var/log/syslog
Nov 22 12:18:42 ubuntu gat dbg sync daemon[11641]: Starting
daemon...
Nov 22 12:18:42 ubuntu qat dbg sync daemon[11641]: Device 0
configuration:
Nov 22 12:18:42 ubuntu qat_dbg_sync_daemon[11641]: -dump_dir:
/qat crash
Nov 22 12:18:42 ubuntu qat dbg sync daemon[11641]: -
dump dir size mb: 4096
Nov 22 12:18:42 ubuntu qat_dbg_sync_daemon[11641]: -
buffer pool size: 128
Nov 22 12:18:42 ubuntu qat dbg sync daemon[11641]: -buffer size mb:
4
Nov 22 12:18:42 ubuntu qat_dbg_sync_daemon[11641]: -level: 2
Nov 22 12:18:42 ubuntu qat dbg sync daemon[11641]: -
dump_on_process_crash:0
Nov 22 12:18:42 ubuntu qat_dbg_sync_daemon[11641]: -sync_mode:
continuous synchronization
Nov 22 12:18:42 ubuntu qat dbg sync daemon[11641]: -
cont sync dir:/qat logs
Nov 22 12:18:42 ubuntu qat_dbg_sync_daemon[11641]: -
cont sync max file size mb:100
```



```
Nov 22 12:18:42 ubuntu qat_dbg_sync_daemon[11641]: -
cont_sync_max_files:10
Nov 22 12:18:42 ubuntu qat_dbg_sync_daemon[11641]: Creating cont-
sync directory: /qat_logs
Nov 22 12:18:42 ubuntu qat_dbg_sync_daemon[11641]: Initialized
cont-sync mode for 1 devices
Nov 22 12:18:42 ubuntu qat_dbg_sync_daemon[11641]: Daemon started
Nov 22 12:18:42 ubuntu qat_dbg_sync_daemon[11641]: QAT events
listener worker started
```

## 8.5 Usage Examples

### 8.5.1 Collecting Data – Sanity Check

Various tools can be used to perform collecting data sanity check. One of them is cpa\_sample\_code (installed directly with QAT package by execution of 'samples-install' target) and this one will be used as an example.

### 8.5.1.1 Continuous Sync Enabled

**NOTE:** Ensure that at least one device is configured with Debug feature (Enabled=1) and cont-sync mode enabled (ContSyncEnabled=1) by editing QAT device configuration file and reloading the configuration by running 'adf\_ctl restart' command

```
1. Run test:
```

cpa\_sample\_code signOfLife=1

2. Check collected data by using 'qat\_dbg\_report' tool:

**NOTE:** You can use different last or audit commands other than 'list' to perform this test as well.

### 8.5.1.2 Continuous Sync Disabled

**NOTE:** Ensure that at least one device is configured with Debug feature enabled (Enabled=1) and cont-sync mode disabled (ContSyncEnabled=0)

```
1. Run test:
```

cpa\_sample\_code signOfLife=1

## intel

- 2. Trigger crash-dump manually to check collected data (set the *dev* parameter the same as QAT device number configured with cont-sync mode disabled):
- qat\_dbg\_report command=dump dev=0
- 3. [Optional] You can check qat\_dbg\_sync\_daemon logs if event has been handled:

```
# tail -F /var/log/messages | grep -i qat_dbg_sync
Nov 30 11:56:25 localhost qat_dbg_sync_daemon[38164]: Daemon started
Nov 30 12:00:39 localhost qat_dbg_sync_daemon[38164]: Received QAT
event: manual_dump
Nov 30 12:00:39 localhost qat_dbg_sync_daemon[38164]: Creating crash
dump directory: /qat_crash
Nov 30 12:00:39 localhost qat_dbg_sync_daemon[38164]: Crash dump in
progress ...
Nov 30 12:00:40 localhost qat_dbg_sync_daemon[38164]: Dumping physical
memory regions to file:
/qat_crash/qat_crash_dev_<dev>_<bdf>_<timestamp>/proc.mmaps.<dev>_<bdf>
Nov 30 12:00:40 localhost qat_dbg_sync_daemon[38164]: Crash dump done -
path: /qat_crash/qat_crash_dev_<dev> <bdf><timestamp>/proc.mmaps.</timestamp></timestamp></timestamp></timestamp></timestamp></timestamp></timestamp></timestamp></timestamp></timestamp></timestamp></timestamp></timestamp></timestamp></timestamp></timestamp></timestamp></timestamp></timestamp></timestamp></timestamp></timestamp></timestamp></timestamp></timestamp></timestamp></timestamp></timestamp></timestamp></timestamp></timestamp></timestamp></timestamp></timestamp></timestamp></timestamp></timestamp></timestamp></timestamp></timestamp></timestamp></timestamp></timestamp></timestamp></timestamp></timestamp></timestamp></timestamp></timestamp></timestamp></timestamp></timestamp></timestamp></timestamp></timestamp></timestamp></timestamp></timestamp></timestamp></timestamp></timestamp></timestamp></timestamp></timestamp></timestamp></timestamp></timestamp></timestamp></timestamp></timestamp></timestamp></timestamp></timestamp></timestamp></timestamp></timestamp></timestamp></timestamp></timestamp></timestamp></timestamp></timestamp></timestamp></timestamp></timestamp></timestamp></timestamp></timestamp></timestamp></timestamp></timestamp></timestamp></timestamp></timestamp></timestamp></timestamp></timestamp></timestamp></timestamp></timestamp></timestamp></timestamp></timestamp></timestamp></timestamp></timestamp></timestamp></timestamp></ti>
```

4. Check collected data by using qat\_dbg\_report tool:

```
# qat_dbg_report path=/qat_crash/qat_crash_dev_<dev>_<timestamp>
command=list last=0
Building index...
DONE
Overall indexed 3430983 msgs.
Requests: 158708 (Sym:11280, PKE:147232, DC:196)
Responses:158708
API calls:3113567
```

*NOTE:* You can use different last or audit commands other than 'list' to perform this test as well.

### 8.5.2 Audit Physical Addresses – Sanity Check

### 8.5.2.1 Emulate Uncorrectable Error

You can force QAT FW to crash by using a modified version of the tool provided in the QAT package. To prepare tool to send incorrect data to QAT FW please use the modified following file:

\$ICP\_ROOT/quickassist/lookaside/access\_layer/src/sample\_code/functional/s
 ym/symdp\_sample/cpa\_sym\_dp\_sample.c

**NOTE:** Set ICP\_ROOT to where you have your package extracted (e.g., export ICP\_ROOT=/root/QAT)

Following change can be applied to force uncorrectable error:

196,199c196,199



```
    pOpData->srcBuffer = sampleVirtToPhys(pSrcBuffer);
    pOpData->srcBufferLen = bufferSize;
    pOpData->dstBuffer = sampleVirtToPhys(pSrcBuffer);
    pOpData->dstBufferLen = bufferSize;
    pOpData->srcBuffer = pSrcBuffer;
    pOpData->srcBufferLen = CPA_DP_BUFLIST;
    pOpData->dstBufferLen = CPA_DP_BUFLIST;
    pOpData->dstBufferLen = CPA_DP_BUFLIST;
```

Compile modified tool by using following commands:

- 1. cd
- \$ICP\_ROOT/quickassist/lookaside/access\_layer/src/sample\_code/function
  al
- 2. make all

To significantly improve recovery time after an uncorrectable error event, ensure that the AutoResetOnError configuration option (AutoResetOnError = 1) is set in the QAT configuration file.

### 8.5.2.2 Continuous Sync Enabled

Ensure that at least one device is configured with Debug feature (Enabled = 1) and contsync mode enabled (ContSyncEnabled = 1)

1. Execute modified tool:

```
# cd
$ICP_ROOT/quickassist/lookaside/access_layer/src/sample_code/functional
/build
# ./sym dp sample
```

- *NOTE:* Please press 'ctrl+c' almost immediately after tool execution. Keeping tool up, can cause timeout on driver side while waiting for client processes to detach from device before restart routine.
- 2. [Optional] Check if event has been caught and handled properly:

```
# tail -F /var/log/messages | grep -i qat_dbg_sync
Nov 3 15:14:38 localhost qat_dbg_sync_daemon[12940]: Received QAT
event: error
Nov 3 15:14:38 localhost qat_dbg_sync_daemon[12940]: Dumping physical
memory regions to file: /qat_logs/proc.mmaps.dev00_0000_4d_00_0
Nov 3 15:14:38 localhost qat_dbg_sync_daemon[12940]: Received QAT
event: restarting
Nov 3 15:14:41 localhost qat_dbg_sync_daemon[12940]: Received QAT
event: restarted
```

3. Execute audit:

## intel.

```
DONE
       Overall indexed 2 msgs.
                Requests: 1 (Sym:1, PKE:0, DC:0)
                Responses:0
                API calls:1
       _____
QAT Physical addresses - audit in progress ...
ERROR: Missing SGL source in log entry.
ERROR: Missing SGL destination in log entry.
ERROR: SGL audit failed - check entry below.
ERROR: address overlapping audit failed - check entry below.
ERROR: Physical address (0x7f460961a800) used in request is out of
process pid: 9423 range.
       Check /qat_logs//proc.mmaps.dev00_0000_4d_00_0 to see process
physical addresses ranges.
ERROR: User process memory regions audit failed - check entry below.
       Entry [REQUEST SYM]: Time-stamp: 2021-11-03 15:14:37.849936202
       Bank: 1 Ring: 2 PID: 9423
                [0.1B] Crypto command ID:
ICP QAT FW LA CMD CIPHER HASH [2]
                [0.2B] Service type: ICP QAT FW COMN REQ CPM FW LA
[4]
                [1.0-1B] LA BULK (SYMMETRIC CRYPTO) COMMAND FLAGS
(0x24)
                       [1.12] ZUC 3G PROTO: 0
                       [1.11] GCM IV LEN FLAG: 0
                       [1.10] DIGEST_IN_BUFFER: 0
                       [1.7-9] PROTO: 0
                       [1.6] CMP_AUTH: 0
                       [1.5]
                                RET AUTH: 1
                                UPDATE STATE: 0
                       [1.4]
                       [1.3]
                                CIPH AUTH CFG OFFSET FLAG: 0
                               CIPH IV FLD FLAG: 1
                       [1.2]
                       [1.0-1] PARTIAL FLAGS: 0 (FULL)
                [1.2B] Common Request flags: 0x1
                        SGL[1] CD IN [0] BNP [0]
                [1.3B] Extended Symmetric Crypto Command Flags: 0
                [2-3] Content Descriptor (CD) Param Pointer:
0x192752c40
                [4.2B] Content Descriptor Param Size: 15 [Quad words]
                [6-7] Opaque Data: 0x7f460961b000
                [8-9] Source phy_addr: 0x7f460961a800
                [10-11] Destination phy_addr: 0x7f460961a800
                [12]
                       Source length: 0
                       Destination length: 0
                [13]
                [14-19] Cipher Request Parameters:
                               uint32_t::cipher_offset: 0
uint32_t::cipher_length: 96
                        [14]
                       [15]
                       [16-17] uint64_t::cipher_IV_ptr:
0xdfc54a821d4c9b7e
                       [18-19] uint64 t::resrvd1:
0x27378daa44a14c99
                [27-28.0B] Cipher Request Control Header:
```



```
[27.0B] uint8 t::cipher state sz: 2
                         [27.1B] uint8 t::cipher key sz: 4
                         [27.2B] uint8 t::cipher cfg offset: 0
                         [27.3B] uint8 t::next curr id: 0x21
(curr id: 1, next: 2)
                         [28.0B] uint8_t::cipher_padding_sz: 0
                  [20-26] Authentication Request Parameters:
                         [20] uint32_t::auth_off: 0
[21] uint32_t::auth_len: 96
[22-23] uint64_t::aad_adr/APS: 0
[24-25] uint64_t::auth_res_addr: 0x192753860
                         [26.0B] uint8 t::aad sz/inner prefix sz: 0
                         [26.1B] uint8_t::resrvd1: 0
                         [26.2B] uint8_t::hash_state_sz: 0
                         [26.3B] uint8 t::auth res sz: 0
                  [27-31] Authentication Request Control Header:
                         [27]
                                 uint32 t::resrvd1: 0x21000402
                         [28.0B] uint8 t::resrvd2: 0x0
                         [28.1B] uint8_t::hash_flags: 0x0
                         [28.2B] uint8_t::hash_cfg_offset: 5
                         [28.3B] uint8_t::next_curr_id: 0x42
(curr id: 2, next: 4)
                         [29.0B] uint8_t::resrvd3: 0x0
[29.1B] uint8_t::outer_prefix_offset: 0
[29.2B] uint8_t::final_sz: 32
[29.3B] uint8_t::inner_res_sz: 32
                         [30.0B] uint8_t::resrvd4: 0x0
                         [30.1B] uint8 t::inner_state1_sz: 32
                         [30.2B] uint8 t::inner state2 offset: 11
                         [30.3B] uint8 t::inner state2 sz: 32
                         [31.0B] uint8 t::outer config offset: 0
                         [31.1B] uint8 t::outer state1 sz: 0
                         [31.2B] uint8_t::outer_res_sz: 0
                         [31.3B] uint8 t::outer prefix offset: 0
                  SGL Data:
    _____
Checked 2 records. Found 1 issue(s).
```

### 8.5.2.3 Continuous Sync Disabled (Crash Dump Based)

- **NOTE:** Ensure that at least one device is configured with Debug feature enabled (Enabled = 1) and cont-sync mode disabled (ContSyncEnabled = 0)
- 1. Execute modified tool:

```
# cd
$ICP_ROOT/quickassist/lookaside/access_layer/src/sample_code/functional
/build
# ./sym_dp_sample
```

*NOTE:* Please press 'ctrl+c' almost immediately after tool execution. Keeping tool up, can cause timeout on driver side while waiting for client processes to detach from device before restart routine

## intel

2. [Optional] Check if event has been caught and handled properly:

```
# tail -F /var/log/messages|grep -i qat dbg sync
Nov 3 15:50:03 localhost gat dbg sync daemon[14822]: Received QAT
event: error
Nov 3 15:50:03 localhost qat dbg sync daemon[14822]: Creating crash
dump directory: /qat crash
Nov 3 15:50:03 localhost qat dbg sync daemon[14822]: Crash dump in
progress ...
Nov 3 15:50:03 localhost gat dbg sync daemon[14822]: Dumping physical
memory regions to file: /qat crash/qat crash dev 00 2021-11-
03 155003//proc.mmaps.dev00 0000 4d 00 0
Nov 3 15:50:03 localhost qat_dbg_sync_daemon[14822]: Crash dump done
- path: /qat crash/qat crash dev 00 2021-11-03 155003/
Nov 3 15:50:03 localhost qat_dbg_sync_daemon[14822]: Received QAT
event: restarting
Nov 3 15:50:06 localhost qat dbg sync daemon[14822]: Received QAT
event: restarted
3. Execute audit:
```

```
# qat_dbg_report
path=/qat crash/qat crash dev <dev> <bdf> <timestamp>/
command=audit phy addresses
____
Building index...
DONE
      Overall indexed 2 msgs.
             Requests: 1 (Sym:1, PKE:0, DC:0)
              Responses:0
             API calls:1
_____
____
_____
====
QAT Physical addresses - audit in progress ...
ERROR: Missing SGL source in log entry.
ERROR: Missing SGL destination in log entry.
ERROR: SGL audit failed - check entry below.
ERROR: address overlapping audit failed - check entry below.
ERROR: Physical address (0x7f950ff52c00) used in request is out of
process pid: <PID> range.
      Check
/qat crash/qat crash dev <dev> <bdf> <timestamp>/proc.mmaps.dev<dev> <
bdf> to see process physical addresses ranges.
ERROR: User process memory regions audit failed - check entry below.
Entry [REQUEST SYM]: Time-stamp: 2020-11-30 13:04:10.59952422
Bank: 1 Ring: 2 PID: <PID>
```



```
[0.1B] Crypto command ID:
ICP QAT FW LA CMD CIPHER HASH [2]
                  [0.2B] Service type: ICP QAT FW COMN REQ CPM FW LA
[4]
                  [1.0-1B] LA BULK (SYMMETRIC CRYPTO) COMMAND FLAGS
(0x24)
                                   [1.12] ZUC_3G_PROTO: 0
                                   [1.11] GCM_IV_LEN_FLAG: 0
[1.10] DIGEST_IN_BUFFER: 0
                                   [1.7-9] PROTO: 0
                                           CMP AUTH: 0
                                   [1.6]
                                           RET AUTH: 1
                                   [1.5]
                                           UPDATE STATE: 0
                                   [1.4]
                                           CIPH AUTH CFG OFFSET FLAG: 0
                                   [1.3]
                                   [1.2]
                                           CIPH IV FLD FLAG: 1
                                   [1.0-1] PARTIAL FLAGS: 0 (FULL)
                  [1.2B] Common Request flags: 0x1
                                    SGL[1] CD IN [0] BNP [0]
                  [1.3B] Extended Symmetric Crypto Command Flags: 0
                  [2-3] Content Descriptor (CD) Param Pointer:
0xeb20e4440
                  [4.2B] Content Descriptor Param Size: 15 [Quad words]
                  [6-7] Opaque Data: 0x7f950ff53400
                          Source phy addr: 0x7f950ff52c00
                  [8-9]
                  [10-11] Destination phy addr: 0x7f950ff52c00
                          Source length: 0
                  [12]
                  [13]
                          Destination length: 0
                  [14-19] Cipher Request Parameters:
                                   [14]
                                           uint32 t::cipher offset: 0
                                           uint32 t::cipher length: 96
                                   [15]
                                   [16-17] uint64 t::cipher IV ptr:
0xdfc54a821d4c9b7e
                                   [18-19] uint64_t::resrvd1:
0x27378daa44a14c99
                  [27-28.0B] Cipher Request Control Header:
                                   [27.0B] uint8_t::cipher_state_sz: 2
                                   [27.1B] uint8_t::cipher_key_sz: 4
                                   [27.2B] uint8 t::cipher cfg offset:
0
                                   [27.3B] uint8_t::next_curr_id: 0x21
(curr id: 1, next: 2)
                                   [28.0B] uint8 t::cipher padding sz:
0
                  [20-26] Authentication Request Parameters:
                                   [20]
                                           uint32 t::auth off: 0
                                   [21]
                                           uint32 t::auth len: 96
                                   [22-23] uint64_t::aad_adr/APS: 0
                                   [24-25] uint64 t::auth res addr:
0xeb20e4c60
                                   [26.0B]
uint8 t::aad sz/inner prefix sz: 0
                                   [26.1B] uint8 t::resrvd1: 0
                                   [26.2B] uint8_t::hash_state_sz: 0
[26.3B] uint8_t::auth_res_sz: 0
                  [27-31] Authentication Request Control Header:
                                           uint32 t::resrvd1:
                                   [27]
0x21000402
                                   [28.0B] uint8 t::resrvd2: 0x0
                                   [28.1B] uint8 t::hash flags: 0x0
```

## intel

```
[28.2B] uint8 t::hash cfg offset: 5
                             [28.3B] uint8 t::next curr id: 0x42
(curr id: 2, next: 4)
                            [29.0B] uint8 t::resrvd3: 0x0
                            [29.1B]
uint8_t::outer_prefix_offset: 0
                            [29.2B] uint8_t::final_sz: 32
                             [29.3B] uint8_t::inner_res_sz: 32
                             [30.0B] uint8_t::resrvd4: 0x0
                             [30.1B] uint8 t::inner state1 sz: 32
                             [30.2B]
uint8 t::inner state2 offset: 11
                            [30.3B] uint8_t::inner_state2_sz: 32
                            [31.0B]
uint8 t::outer config offset: 0
                            [31.1B] uint8 t::outer state1 sz: 0
                            [31.2B] uint8 t::outer res sz: 0
                            [31.3B]
uint8 t::outer prefix offset: 0
             SGL Data:
_____
====
Checked 2 records. Found 1 issue(s).
```

### 8.5.3 Audit Cipher Buffers Alignment – Sanity Check

### 8.5.3.1 Emulate Slice Hang Caused by Incorrect Buffers Alignments

**NOTE:** To execute this test, the package should be compiled with '--disable-param-check' option. To do this, you should uninstall existing package and install it again with extra configuration option mentioned above

You can force QAT slice to hang by using a modified version of the tool provided by the QAT package. To prepare tool to send incorrect data to QAT FW please modify the following file: \$ICP\_ROOT/quickassist/lookaside/access\_layer/src/sample\_code/functional/s ym/ipsec\_sample/cpa\_ipsec\_sample.c

Following change can be applied to buffer lengths alignment error and slice hang:

```
308c308
- pOpData->messageLenToCipherInBytes =
sizeof(samplePayload);
+ pOpData->messageLenToCipherInBytes = 2;
```

Compile modified tool by using following commands:

```
1. cd
   $ICP_ROOT/quickassist/lookaside/access_layer/src/sample_code/function
   al
```

2. make all



### 8.5.3.2 Slice Hang Handling with Continuous Sync Enabled

Ensure that at least one device is configured with Debug feature (Enabled = 1) and contsync mode enabled (ContSyncEnabled = 1)

1. Execute modified tool:

```
# cd
$ICP ROOT/quickassist/lookaside/access layer/src/sample code/functional
/build
# ./ipsec sample
main(): Starting IPSec Sample Code App ...
algChainSample(): cpaCyStartInstance
algChainSample(): Encrypt-Generate ICV
algChainPerformOp(): cpaCySymPerformOp
[error] LacSymQat SymLogSliceHangError() - : slice hang detected on CPM
cipher or auth slice.
[error] LacSymCb ProcessCallbackInternal() - : Response status value
not as expected
symCallback(): Callback called with status = -1.
symCallback(): Callback verify result error
algChainPerformOp(): Output does not match expected output encrypt
generate
algChainSample(): cpaCyStopInstance
algChainSample(): Sample code failed with status of -1
main():
IPSec Sample Code App failed
```

2. [Optional] Check if event has been caught and handled properly:

```
# tail -F /var/log/messages | grep -i qat_dbg_sync
Nov 30 16:09:34 localhost qat_dbg_sync_daemon[22391]:
Received QAT event: err_resp
Nov 30 16:09:34 localhost qat_dbg_sync_daemon[22391]: Dumping
physical memory regions to file:
/qat_logs/proc.mmaps.dev00_0000_4d_00_0
```

3. Execute audit:

## intel.

```
QAT request fields length - audit in progress...
ERROR: Cipher data size must be block multiple (Cipher len:2, block
size:16) for alg: CPA CY SYM CIPHER AES CBC
        Entry [REQUEST SYM]: Time-stamp: 2021-11-23 07:37:32.625265272
        Bank: 0 Ring: 1 PID: 23612
                [0.1B] Crypto command ID:
ICP QAT FW LA CMD CIPHER HASH [2]
                [0.2B] Service type: ICP QAT FW COMN REQ CPM FW LA [4]
                [1.0-1B] LA BULK (SYMMETRIC CRYPTO) COMMAND FLAGS
(0x2c)
                         [1.12] ZUC 3G PROTO: 0
                         [1.11] GCM IV LEN_FLAG: 0
                         [1.10] DIGEST IN BUFFER: 0
                        [1.7-9] PROTO: 0
                        [1.6] CMP AUTH: 0
                         [1.5] RET AUTH: 1
                         [1.4] UPDATE STATE: 0
                         [1.3] CIPH AUTH CFG OFFSET FLAG: 1
                         [1.2] CIPH IV FLD FLAG: 1
                         [1.0-1] PARTIAL FLAGS: 0 (FULL)
                [1.2B] Common Request flags: 0x1
                         SGL[1] CD IN [0] BNP [0]
                 [1.3B] Extended Symmetric Crypto Command Flags: 0
                [2-3] Content Descriptor (CD) Param Pointer:
0x1af801640
                [4.2B] Content Descriptor Param Size: 8 [Quad words]
                [6-7] Opaque Data: 0x7f9dd0ebbc40
                [8-9]
                        Source phy_addr: 0x1af802000
                [10-11] Destination phy addr: 0x1af802000
                        Source length: 0
                [12]
                        Destination length: 0
                [13]
                [14-19] Cipher Request Parameters:
                                uint32_t::cipher_offset: 24
                         [14]
                                uint32_t::cipher_length: 2
                         [15]
                         [16-17] uint64_t::cipher_IV_ptr:
0xaddbcefabebafeca
                         [18-19] uint64 t::resrvd1: 0x459113d88f8cade
                [27-28.0B] Cipher Request Control Header:
                         [27.0B] uint8_t::cipher_state_sz: 2
[27.1B] uint8_t::cipher_key_sz: 2
                         [27.2B] uint8 t::cipher cfg offset: 18
                         [27.3B] uint8 t::next curr id: 0x21 (curr id:
1, next: 2)
                        [28.0B] uint8 t::cipher padding sz: 0
                [20-26] Authentication Request Parameters:
                         [20]
                               uint32 t::auth off: 0
                                uint32_t::auth_len: 88
                         [21]
                         [22-23] uint64_t::aad_adr/APS: 0x1af801820
                         [24-25] uint64 t::auth res addr: 0x1af802458
                         [26.0B] uint8_t::aad_sz/inner_prefix_sz: 0
                         [26.1B] uint8_t::resrvd1: 0
                         [26.2B] uint8_t::hash_state_sz: 0
[26.3B] uint8_t::auth_res_sz: 0
                [27-31] Authentication Request Control Header:
                               uint32 t::resrvd1: 0x21120202
                         [27]
                         [28.0B] uint8 t::resrvd2: 0x0
                         [28.1B] uint8 t::hash flags: 0x0
                         [28.2B] uint8 t::hash cfg_offset: 46
```



```
[28.3B] uint8 t::next curr id: 0x42 (curr id:
2, next: 4)
                      [29.0B] uint8 t::resrvd3: 0x0
                      [29.1B] uint8 t::outer prefix offset: 0
                      [29.2B] uint8 t::final sz: 12
                      [29.3B] uint8_t::inner_res_sz: 20
                      [30.0B] uint8_t::resrvd4: 0x0
                      [30.1B] uint8_t::inner_state1_sz: 24
                      [30.2B] uint8_t::inner_state2_offset: 3
[30.3B] uint8_t::inner_state2_sz: 24
[31.0B] uint8_t::outer_config_offset: 0
                      [31.1B] uint8 t::outer state1 sz: 0
                      [31.2B] uint8_t::outer_res_sz: 0
                      [31.3B] uint8 t::outer_prefix_offset: 0
               SGL Data:
                      Source SGL contains 1 flat buffer(s):
                              [0] Flat buffer: len: 100 phy addr:
0x1af802400
                      Destination SGL contains 1 flat buffer(s):
                             [0] Flat buffer: len: 100 phy addr:
0x1af802400
_____
____
Checked 3 records. Found 1 issue(s).
```

### 8.5.3.3 Slice Hang Handling with Continuous Sync Disabled

Ensure that at least one device is configured with Debug feature enabled (Enabled = 1) and cont-sync mode disabled (ContSyncEnabled = 0)

• Execute modified tool:

```
$ICP ROOT/quickassist/lookaside/access layer/src/sample code/functional
/build
# ./ipsec sample
main(): Starting IPSec Sample Code App ...
algChainSample(): cpaCyStartInstance
algChainSample(): Encrypt-Generate ICV
algChainPerformOp(): cpaCySymPerformOp
[error] LacSymQat SymLogSliceHangError() - : slice hang detected on CPM
cipher or auth slice.
[error] LacSymCb ProcessCallbackInternal() - : Response status value
not as expected
symCallback(): Callback called with status = -1.
symCallback(): Callback verify result error
algChainPerformOp(): Output does not match expected output encrypt
generate
algChainSample(): cpaCyStopInstance
algChainSample(): Sample code failed with status of -1
main():
IPSec Sample Code App failed
```

4. [Optional] Check if event has been caught and handled properly:

## intel.

```
# tail -F /var/log/messages | grep -i qat_dbg_sync
Nov 30 16:17:02 localhost qat_dbg_sync_daemon[23780]: Received QAT
event: err_resp
Nov 30 16:17:02 localhost qat_dbg_sync_daemon[23780]: Creating crash
dump directory: /qat_crash
Nov 30 16:17:02 localhost qat_dbg_sync_daemon[23780]: Crash dump in
progress ...
Nov 30 16:17:02 localhost qat_dbg_sync_daemon[23780]: Dumping physical
memory regions to file: /qat_crash/qat_crash_dev_00_0000_4d_00_0_2020-
11-30_161702//proc.mmaps.dev00_0000_4d_00_0
Nov 30 16:17:02 localhost qat_dbg_sync_daemon[23780]: Crash dump done
- path: /qat_crash/qat_crash_dev_00_0000_4d_00_0_2020-11-30_161702/
```

5. Execute audit:

```
# qat dbg report path=/qat crash/qat crash dev <dev> <bdf> <timestamp>
command=audit fields lengths
_____
____
Building index...
DONE
      Overall indexed 3 msgs.
             Requests: 1 (Sym:1, PKE:0, DC:0)
             Responses:1
             API calls:1
_____
====
_____
====
QAT request fields length - audit in progress...
ERROR: Cipher data size must be block multiple (Cipher len:2, block
size:16) for alg: CPA CY SYM CIPHER AES CBC
      Entry [REQUEST SYM]: Time-stamp: 2021-11-23 07:39:01.357421282
      Bank: 0 Ring: 1 PID: 23661
             [0.1B] Crypto command ID:
ICP_QAT_FW_LA_CMD_CIPHER_HASH [2]
             [0.2B] Service type: ICP QAT FW COMN REQ CPM FW LA [4]
             [1.0-1B] LA BULK (SYMMETRIC CRYPTO) COMMAND FLAGS
(0x2c)
                    [1.12] ZUC 3G PROTO: 0
                    [1.11] GCM IV LEN FLAG: 0
                    [1.10] DIGEST IN BUFFER: 0
                    [1.7-9] PROTO: 0
                    [1.6] CMP AUTH: 0
                    [1.5] RET AUTH: 1
                    [1.4] UPDATE STATE: 0
                    [1.3] CIPH AUTH CFG OFFSET FLAG: 1
                    [1.2] CIPH IV FLD FLAG: 1
                    [1.0-1] PARTIAL FLAGS: 0 (FULL)
             [1.2B] Common Request flags: 0x1
                     SGL[1] CD_IN [0] BNP [0]
             [1.3B] Extended Symmetric Crypto Command Flags: 0
```

| 0x18e001640     | [2-3] Content Descriptor (CD) Param Pointer:                                               |
|-----------------|--------------------------------------------------------------------------------------------|
| 0X10E001040     | [4.2B] Content Descriptor Param Size: 8 [Quad words]                                       |
|                 | [6-7] Opaque Data: 0x7f4681b98c40                                                          |
|                 | [8-9] Source phy_addr: 0x18e002000                                                         |
|                 | [10-11] Destination phy addr: 0x18e002000                                                  |
|                 | [12] Source length: $\overline{0}$                                                         |
|                 | [13] Destination length: 0                                                                 |
|                 | [14-19] Cipher Request Parameters:                                                         |
|                 | <pre>[14] uint32_t::cipher_offset: 24 [15] uint32_t::cipher_length: 2</pre>                |
|                 | <pre>[15] uint32_t::cipher_length: 2</pre>                                                 |
|                 | [16-17] uint64_t::cipher_IV_ptr:                                                           |
| 0xaddbcefabebaf | eca<br>[18-19] uint64 t::resrvd1: 0x459113d88f8cade                                        |
|                 | [27-28.0B] Cipher Request Control Header:                                                  |
|                 | [27.0B] uint8 t::cipher state sz: 2                                                        |
|                 | [27.1B] uint8_t::cipher_key_sz: 2                                                          |
|                 | [27.2B] uint8 t::cipher cfg offset: 18                                                     |
|                 | [27.3B] uint8_t::next_curr_id: 0x21 (curr_id:                                              |
| 1, next: 2)     |                                                                                            |
|                 | <pre>[28.0B] uint8_t::cipher_padding_sz: 0</pre>                                           |
|                 | [20-26] Authentication Request Parameters:                                                 |
|                 | [20] uint32_t::auth_off: 0                                                                 |
|                 | [21] uint32_t::auth_len: 88                                                                |
|                 | [22-23] uint64_t::aad_adr/APS: 0x18e001820<br>[24-25] uint64 t::auth res addr: 0x18e002458 |
|                 | [24 25] uint04_t:.auth_les_auth_ viree002456<br>[26.0B] uint8 t::aad sz/inner prefix sz: 0 |
|                 | [26.1B] uint8 t::resrvd1: 0                                                                |
|                 | [26.2B] uint8 t::hash state sz: 0                                                          |
|                 | [26.3B] uint8 t::auth res sz: 0                                                            |
|                 | [27-31] Authentication Request Control Header:                                             |
|                 | <pre>[27] uint32_t::resrvd1: 0x21120202</pre>                                              |
|                 | [28.0B] uint8_t::resrvd2: 0x0                                                              |
|                 | [28.1B] uint8_t::hash_flags: 0x0                                                           |
|                 | [28.2B] uint8_t::hash_cfg_offset: 46                                                       |
| 2               | <pre>[28.3B] uint8_t::next_curr_id: 0x42 (curr_id:</pre>                                   |
| 2, next: 4)     | [29.0B] uint8 t::resrvd3: 0x0                                                              |
|                 | [29.1B] uint8 t::outer prefix offset: 0                                                    |
|                 | [29.2B] uint8 t::final sz: 12                                                              |
|                 | [29.3B] uint8 t::inner res sz: 20                                                          |
|                 | [30.0B] uint8 t::resrvd4: 0x0                                                              |
|                 | [30.1B] uint8_t::inner_state1_sz: 24                                                       |
|                 | <pre>[30.2B] uint8_t::inner_state2_offset: 3</pre>                                         |
|                 | [30.3B] uint8_t::inner_state2_sz: 24                                                       |
|                 | [31.0B] uint8_t::outer_config_offset: 0                                                    |
|                 | [31.1B] uint8_t::outer_state1_sz: 0                                                        |
|                 | [31.2B] uint8_t::outer_res_sz: 0                                                           |
|                 | [31.3B] uint8_t::outer_prefix_offset: 0<br>SGL Data:                                       |
|                 | SGL Data:<br>Source SGL contains 1 flat buffer(s):                                         |
|                 | [0] Flat buffer: len: 100 phy addr:                                                        |
| 0x18e002400     | <b>·</b> · · · · · ·                                                                       |
|                 | Destination SGL contains 1 flat buffer(s):                                                 |
|                 | <pre>[0] Flat buffer: len: 100 phy_addr:</pre>                                             |
| 0x18e002400     |                                                                                            |
|                 |                                                                                            |
| ====            |                                                                                            |
|                 |                                                                                            |

### 8.5.4 Audit Return Codes

To collect data with incorrect return codes – the tool with the same modifications as described in Section 8.5.3.1 can be used.

### 8.5.4.1 Audit Return Codes - Continuous Sync Option

An example audit of return codes for the data collected in Section 8.5.3.2 looks as follows:

```
# qat dbg report path=/qat logs dev=0 command=audit ret codes
______
====
Building index...
DONE
      Overall indexed 3 msgs.
            Requests: 1 (Sym:1, PKE:0, DC:0)
             Responses:1
             API calls:1
_____
____
_____
====
QAT Response return codes audit in progress ...
WARNING: Incorrect response RCs. Status: 128 error code: 0xf0
      Entry [RESPONSE SYM]: Time-stamp: 2021-11-23
07:37:32.646255054
      Bank: 0 Ring: 5 PID: 23612
             [0.1B] Service ID: ICP_QAT_FW_COMN_RESP_SERV_CPM_FW
[1]
             [0.2B] Response type: ICP_QAT_FW_COMN_REQ_CPM FW LA
[4]
             [1.3B] Command ID: ICP QAT FW LA CMD CIPHER HASH [2]
             [1.1B] Common error code: 240
             [1.2B] Common status flags: 0x80
                     CRYPTO STAT FLAG: 1
                    PKE STAT FLAG: 0
                    CMP STAT FLAG: 0
                    XLAT STAT FLAG: 0
                    XLAT APPLIED STAT FLAG: 0
                    CMP EOF LAST BLK FLAG: 0
                    UNSUPPORTED RQ STAT FLAG: 0
             [2-3] Opaque data: 0x7f9dd0ebbc40
      Entry [REQUEST SYM]: Time-stamp: 2021-11-23 07:37:32.625265272
      Bank: 0 Ring: 1 PID: 23612
```



```
[0.1B] Crypto command ID:
ICP QAT FW LA CMD CIPHER HASH [2]
                [0.2B] Service type: ICP QAT FW COMN REQ CPM FW LA [4]
                [1.0-1B] LA BULK (SYMMETRIC CRYPTO) COMMAND FLAGS
(0x2c)
                         [1.12] ZUC_3G_PROTO: 0
                         [1.11] GCM_IV_LEN_FLAG: 0
                         [1.10] DIGEST_IN_BUFFER: 0
                         [1.7-9] PROTO: 0
                                 CMP AUTH: 0
                         [1.6]
                                 RET AUTH: 1
                         [1.5]
                                 UPDATE STATE: 0
                         [1.4]
                                 CIPH AUTH CFG OFFSET_FLAG: 1
                         [1.3]
                         [1.2]
                                CIPH IV FLD FLAG: 1
                        [1.0-1] PARTIAL FLAGS: 0 (FULL)
                [1.2B] Common Request flags: 0x1
                          SGL[1] CD IN [0] BNP [0]
                [1.3B] Extended Symmetric Crypto Command Flags: 0
                [2-3] Content Descriptor (CD) Param Pointer:
0x1af801640
                [4.2B] Content Descriptor Param Size: 8 [Quad words]
                 [6-7]
                        Opaque Data: 0x7f9dd0ebbc40
                         Source phy_addr: 0x1af802000
                 [8-9]
                 [10-11] Destination phy addr: 0x1af802000
                 [12]
                        Source length: 0
                        Destination length: 0
                [13]
                [14-19] Cipher Request Parameters:
                         [14]
                                uint32 t::cipher offset: 24
                                uint32 t::cipher length: 2
                         [15]
                         [16-17] uint64 t::cipher IV ptr:
0xaddbcefabebafeca
                         [18-19] uint64 t::resrvd1: 0x459113d88f8cade
                [27-28.0B] Cipher Request Control Header:
                         [27.0B] uint8_t::cipher_state_sz: 2
                         [27.1B] uint8_t::cipher_key_sz: 2
                         [27.2B] uint8_t::cipher_cfg_offset: 18
                         [27.3B] uint8 t::next curr id: 0x21 (curr id:
1, next: 2)
                         [28.0B] uint8 t::cipher padding sz: 0
                [20-26] Authentication Request Parameters:
                                uint32 t::auth off: 0
                         [20]
                         [21]
                                uint32 t::auth len: 88
                         [22-23] uint64 t::aad adr/APS: 0x1af801820
                         [24-25] uint64 t::auth res addr: 0x1af802458
                         [26.0B] uint8 t::aad sz/inner prefix sz: 0
                         [26.1B] uint8 t::resrvd1: 0
                         [26.2B] uint8_t::hash_state_sz: 0
                         [26.3B] uint8_t::auth_res_sz: 0
                [27-31] Authentication Request Control Header:
                               uint32_t::resrvd1: 0x21120202
                         [27]
                         [28.0B] uint8_t::resrvd2: 0x0
[28.1B] uint8_t::hash_flags: 0x0
[28.2B] uint8_t::hash_cfg_offset: 46
                         [28.3B] uint8 t::next curr id: 0x42 (curr id:
2, next: 4)
                         [29.0B] uint8 t::resrvd3: 0x0
                         [29.1B] uint8 t::outer prefix offset: 0
                         [29.2B] uint8 t::final sz: 12
                         [29.3B] uint8 t::inner res sz: 20
```

# intel.

```
[30.0B] uint8 t::resrvd4: 0x0
                   [30.1B] uint8 t::inner state1 sz: 24
                   [30.2B] uint8 t::inner state2 offset: 3
                   [30.3B] uint8 t::inner state2 sz: 24
                   [31.0B] uint8 t::outer config offset: 0
                   [31.1B] uint8_t::outer_state1_sz: 0
                   [31.2B] uint8_t::outer_res_sz: 0
                   [31.3B] uint8_t::outer_prefix_offset: 0
             SGL Data:
                   Source SGL contains 1 flat buffer(s):
                          [0] Flat buffer: len: 100 phy addr:
0x1af802400
                   Destination SGL contains 1 flat buffer(s):
                          [0] Flat buffer: len: 100 phy addr:
0x1af802400
_____
____
Checked 3 records. Found 1 issue(s).
```

Audit prints warnings in case of any issue found. Every response considered as unsuccessful is tried to be matched to its corresponding request.

**NOTE:** In some cases this operation may not be possible because the searched request could already be overwritten by new data.

## 8.6 SR-IOV

The Debuggability feature can be used in Single Root – Input Output Virtualization terms.

**NOTE:** As the errors occurring on VF utilized on the host is forwarded, unnecessary logs might be generated if the feature is enabled on guest utilizing the same QAT device. Thus, logs might not contain error information.

### 8.6.1 Build instructions

The usage flow might be following:

| Step<br>No. | Host                                                               | Guest |
|-------------|--------------------------------------------------------------------|-------|
| 1           | Set up BIOS and OS                                                 |       |
| 2           | Install the driver with debuggability flags with configure options |       |
| 3           | Start the desired KVM                                              |       |
| 4           | Stop/start QAT PF service                                          |       |
| 5           | Start QAT VFs service                                              |       |
| 6           | Attach desired VFs to Guest                                        |       |



| 7 | Install the driver with debuggability flags with configure options |
|---|--------------------------------------------------------------------|
| 8 | Set up configuration                                               |

Example steps to build the environment might be as follows:

1. Set up BIOS and OS:

Options regarding VT-d, VT-x, and SR-IOV should be enabled in BIOS. Host OS needs to enable IOMMU. Details can be found in 2.1 *Updating the BIOS Settings* and 2.2 *Installing and Configuring the Host Operating System* sections of Using Intel® Virtualization Technology (Intel® VT) with Intel® QuickAssist Technology App Notes.

2. Install on host:

```
cd /QAT && \
tar -xzomf QAT.L.4.17.0-00002.tar.gz && \
export ICP_ROOT=$PWD && \
./configure --enable-icp-sriov=host --enable-icp-qat-dbg && \
make install -j
```

3. Install and start the desired KVM:

```
virt-install --name Ubuntu_20.04_64bit \
--memory 8096 \
--cpu host \
--vcpus 8 \
--os-type linux \
--os-variant ubuntu20.04 \
--import \
--graphics none \
--disk /PathToImage/Ubuntu_20.04_64_k5.4.0-
90_bl_v000.img,format=img,bus=virtio \
--network direct,source=enp4s0,source_mode=bridge,model=virtio \
--network bridge=virbr0,model=virtio && \
virsh start Ubuntu_20.04_64bit
```

- 4. Change configuration: set enable equals lin [DEBUG] section and reload with adf\_ctl utility for desired VF. Restart QAT service with: service qat\_service stop &&\ service qat\_service start
- 5. Start VFs service with: /etc/init.d/gat\_service\_vfs start
- 6. Attach desired VFs to KVM. It might be done by virsh attach-device Ubuntu\_20.04\_64bit attachdev.xml Assuming that the attachdev.xml file has following body and it's addresses (domain, bus, slot, and function corresponds to appropriate VF):

7. On guest: Ensure that the VFs were appropriately attached using: lspci|grep Co-

## intel

- 8. Install with guest indicating flag:
   ./configure --enable-icp-sriov=guest --enable-icp-qat-dbg && \
   make install -j
- 9. Set the configuration (ex. by modifying configuration file in /etc/ directory and using *adf\_ctl* utility).

### 8.6.2 Usage

After successful installation the tool can be used in multiple ways:

- Usage on SR-IOV guest only is the same as without SR-IOV enabled. The difference is that VFs are utilized, so there is a different module loaded and QAT configuration files names contain 'vf' string.
- For SR-IOV host only, there is a need to insert appropriate QAT VF kernel module. For example, in case of QAT Gen 2 Intel® C62x Chipset, it would be qat\_c62xvf. In this case, the feature can be used as in a PF only scenario.
- In order to use the detached VFs kernel module, qat\_\*vfs should not be added and sysfs should be used to configure the device.
- To utilize the host attached and detached devices, the QAT VF kernel module should be added, required VFs are detached (for example, with virsh nodedev-detach) and sysfs is used to configure detached VFs with Debuggability.

## 8.7 Programming Guide

### 8.7.1 Physical to Virtual Translation Callback

By default, QAT Debug uses common, USDM-based (User Space DMA-able Memory) routine to perform physical to virtual address translation, which can be used with USDM memory allocator only. To provide possibility to use user-selected memory allocator, the user has to provide custom implementation of physical to virtual address translation routine for Debug purposes.

The callback is called in the same context as payload thread, immediately before placing QAT request to DMA-able memory.

Callback setter definition is available in the following header file: \$ICP ROOT/quickassist/lookaside/access layer/include/icp sal user.h

and look as follows:

# intel.

```
* @sideEffects
*
     None
* @reentrant
*
     Yes
 * @threadSafe
 *
     Yes
* @param[in] instanceHandle Instance handle
* @param[in] user_dbg_phys2virt Function which will be translating
                               physical addresses to virtual ones
* @retval CPA_STATUS_FAIL Failed to extract transport
handle
* @retval CPA_STATUS_SUCCESS User callback set successfully
CpaStatus icp_sal_userSetDbgPhysToVirtCallback(
   CpaInstanceHandle instanceHandle,
   icp_sal_dbg_phys2virt_callback user_dbg_phys2virt);
```

QAT Debug handle can be extracted after transport initialization by the following function:

\$ICP ROOT/quickassist/lookaside/access layer/include/icp adf transport.h

§