

# **Programmer's Guide**

# Intel® QuickAssist Technology

Hardware Version 2.0

December 2024

Document Number: 743912-007

Performance varies by use, configuration and other factors. Learn more on the Intel's Performance Index site.

Performance results are based on testing as of dates shown in configurations and may not reflect all publicly available updates. See backup for configuration details. No product or component can be absolutely secure.

Your costs and results may vary.

Intel technologies may require enabled hardware, software or service activation.

The products described may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.

Code names are used by Intel to identify products, technologies, or services that are in development and not publicly available. These are not "commercial" names and not intended to function as trademarks.

See Intel's Legal Notices and Disclaimers.

© Intel Corporation. Intel, the Intel logo, Atom, Xeon, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.

# Contents

| 1 | <b>Abc</b><br>1.1 | out this Document<br>Conventions and Terminology         | <b>1</b><br>1 |
|---|-------------------|----------------------------------------------------------|---------------|
| 2 | Arc               | hitecture                                                | 3             |
| 3 | Infra             | astructure                                               | 5             |
|   | 3.1               | Queues and Queue Pairs                                   | 5             |
|   |                   | 3.1.1 Queues Pairs                                       | 5             |
|   |                   | 3.1.2 Queue Bundles                                      | 6             |
|   | 3.2               |                                                          | 7             |
|   |                   | 3.2.1 Configurable Items (via config file)               | 7             |
|   | 3.3               | Memory Management                                        | 7             |
|   |                   | 3.3.1 Shared Virtual Memory                              | 7             |
|   |                   | 3.3.1.1 SVM Kernel Requirements                          | 8             |
|   |                   | 3.3.2 DMA-able Memory                                    | 8             |
|   |                   | 3.3.3 Memory Type Determination                          | 9             |
|   |                   | 3.3.4 Buffer Formats                                     | 9             |
|   |                   | 3.3.4.1 Flat Buffers                                     | 9             |
|   |                   | 3.3.4.2 Scatter-Gather List (SGL) Buffers                | 9             |
|   |                   | 3.3.5 Huge Pages                                         |               |
|   | 3.4               | Modes of Operation                                       |               |
|   |                   | 3.4.1 Calling Semantics                                  | 12            |
|   |                   | 3.4.1.1 Asynchronous (Polled)                            | 12            |
|   |                   | 3.4.1.2 Asynchronous (Interrupts)                        | 12            |
|   |                   | 3.4.1.3 Synchronous                                      | 12            |
|   |                   | 3.4.1.4 Pros And Cons                                    | 12            |
|   | 3.5               | Load Balancing                                           |               |
|   |                   | 3.5.1 Per Endpoint                                       | 16            |
|   |                   | 3.5.2 Across Endpoints                                   | 18            |
|   |                   | 3.5.2.1 Load Sharing Criteria                            | 18            |
|   |                   | 3.5.3 Dimensions                                         | 19            |
|   | 3.6               | Debugability                                             | 19            |
|   |                   | 3.6.1 Overview of Intel <sup>®</sup> QAT debugfs entries | 19            |
|   |                   | 3.6.1.1 Entries in /sys/kernel/debug/qat_*               |               |
|   |                   | 3.6.1.2 Memory driver queries (qae_mem_slabs)            |               |
|   | 3.7               | Heartbeat                                                |               |

|   |      | 3.7.1  | Heartbeat Operation                                                      | 21 |
|---|------|--------|--------------------------------------------------------------------------|----|
|   |      |        | 3.7.1.1 Initialization                                                   |    |
|   |      |        | 3.7.1.2 Heartbeat Monitoring                                             |    |
|   |      |        | 3.7.1.3 Resetting a Failed Device                                        |    |
|   |      | 3.7.2  | Incorporating Heartbeat into Intel <sup>®</sup> QAT Applications         |    |
|   |      | 3.7.3  | Restart Sequence                                                         |    |
|   |      | 3.7.4  | Status of Packets in Flight (Crypto Applications Only)                   |    |
|   |      | 3.7.5  | Determining Device ID                                                    |    |
|   |      | 3.7.6  | Testing Heartbeat                                                        |    |
|   |      | 0.7.0  | 3.7.6.1 Simulated Heartbeat Failure Configuration                        |    |
|   |      |        | 3.7.6.2 Simulating Heartbeat Failure                                     |    |
|   |      | 3.7.7  | Handling Device Failures in a Virtualized Environment                    |    |
|   |      | 3.7.8  |                                                                          |    |
|   | 20   |        | Incorporating Dummy Responses into an Intel <sup>®</sup> QAT Application |    |
|   | 3.8  |        | netry                                                                    |    |
|   |      | 3.8.1  | Telemetry Usage                                                          |    |
|   |      |        | 3.8.1.1 Out-Of-Tree                                                      |    |
|   |      |        | 3.8.1.2 In-Tree                                                          |    |
|   |      | 3.8.2  | Telemetry Control                                                        |    |
|   |      |        | 3.8.2.1 Telemetry Commands                                               |    |
|   |      |        | 3.8.2.2 Device Level Telemetry Values                                    |    |
|   |      |        | 3.8.2.3 Ring Pair Level Telemetry Values                                 |    |
|   |      | 3.8.3  | Monitoring Telemetry - Text Based                                        |    |
|   |      |        | 3.8.3.1 Out-Of-Tree                                                      |    |
|   |      |        | 3.8.3.2 In-Tree                                                          |    |
|   | 3.9  |        | _imiting                                                                 |    |
|   |      | 3.9.1  | Service Level Agreement (SLA)                                            |    |
|   |      | 3.9.2  |                                                                          |    |
|   |      | 3.9.3  | SLA Manager Application                                                  |    |
|   |      |        | 3.9.3.1 SLA Commands                                                     |    |
|   | 3.10 |        | r Management                                                             |    |
|   |      | 3.10.1 | Configuration                                                            |    |
|   |      |        | 3.10.1.1 Out-of-Tree                                                     |    |
|   |      |        | 3.10.1.2 In-Tree                                                         | 37 |
|   |      | 3.10.2 | Usage                                                                    | 37 |
|   |      | 3.10.3 | Considerations                                                           | 38 |
|   | 3.11 | Reliab | pility, Availability, and Stability (RAS)                                | 38 |
|   |      | 3.11.1 | RAS Usage                                                                | 38 |
|   |      | 3.11.2 | AER Errors                                                               | 39 |
| _ | _    |        |                                                                          |    |
| 4 |      |        |                                                                          | 0  |
|   | 4.1  |        | olling the Driver                                                        |    |
|   |      | 4.1.1  | qat_service                                                              |    |
|   |      |        | 4.1.1.1 qat_service Usage                                                |    |
|   |      | 4.1.2  | adf_ctl                                                                  |    |
|   |      |        | 4.1.2.1 adf_ctlUsage                                                     |    |
|   |      |        | 4.1.2.2 Examples                                                         |    |
|   | 4.2  | • •    | cation Payload Memory Allocation                                         |    |
|   |      | 4.2.1  | Services                                                                 | łЗ |

|   |            | 4.2.2  | Thread S   | Specific USDM                                                               | <br> | 43 |
|---|------------|--------|------------|-----------------------------------------------------------------------------|------|----|
|   | 4.3        | Retur  | n Codes    | ·<br>· · · · · · · · · · · · · · · · · · ·                                  | <br> | 44 |
|   | 4.4        | Linux' | * Device D | river Operations Return Codes                                               | <br> | 45 |
| _ | _          |        |            |                                                                             |      |    |
| 5 |            |        | tion Files |                                                                             |      | 47 |
|   | 5.1        |        | 2          | ile Overview                                                                |      |    |
|   | 5.2        |        |            | 1                                                                           |      |    |
|   |            | 5.2.1  |            | Enabled                                                                     |      |    |
|   |            |        |            | Performance Considerations                                                  |      |    |
|   |            | 5.2.2  | Services   | Profile                                                                     | <br> | 50 |
|   |            |        | 5.2.2.1    | General Default Configuration Parameters                                    | <br> | 51 |
|   |            | 5.2.3  | Concurre   | ent Requests                                                                | <br> | 51 |
|   |            | 5.2.4  | Power M    | anagement Parameters                                                        | <br> | 52 |
|   |            | 5.2.5  | Shared V   | /irtual Memory (SVM) Parameters                                             | <br> | 52 |
|   |            |        | 5.2.5.1    | SVMEnabled                                                                  | <br> | 52 |
|   |            |        | 5.2.5.2    | ATEnabled                                                                   | <br> | 53 |
|   | 5.3        | Logic  |            | es Section                                                                  |      |    |
|   |            | 5.3.1  |            | L] Section                                                                  |      |    |
|   |            | 5.3.2  | -          | cess [xxxxx] Sections                                                       |      |    |
|   |            | 5.3.3  |            | aphic Logical Instance Parameters                                           |      |    |
|   |            | 5.3.4  |            | npression Logical Instance Parameters                                       |      |    |
|   |            | 5.3.5  |            | he Core Affinity Parameter for a Logical Instance                           |      |    |
|   | 5.4        |        |            | per of Process Calculations                                                 |      |    |
|   | 5.4        | 5.4.1  |            | g the Maximum Number of Processes/Instances                                 |      |    |
|   |            | 0.4.1  |            | Invalid Configurations                                                      |      |    |
|   |            |        |            | Configuring Instances for Virtual Functions                                 |      |    |
|   | 5.5        | Confi  |            | Itiple Intel <sup>®</sup> QuickAssist Technology Endpoints in a System      |      |    |
|   | 5.6        |        |            | Itiple Processes on a System with Multiple Intel <sup>®</sup> QAT Endpoints |      |    |
|   | 5.0<br>5.7 |        |            |                                                                             |      |    |
|   | 5.7        | Samp   | le coningt |                                                                             | <br> | 05 |
| 6 | Ser        | vices  |            |                                                                             |      | 66 |
| - | 6.1        |        | Compress   | ion                                                                         | <br> |    |
|   | • • •      |        | •          | ssion Features                                                              |      |    |
|   |            |        |            |                                                                             |      |    |
|   |            | 6.1.3  | •          | ssion Session Setup                                                         |      |    |
|   |            | 6.1.4  |            | ression Session Setup                                                       |      |    |
|   |            | 0.1. 1 |            | Deflate Decompression                                                       |      |    |
|   |            |        |            | LZ4 Decompression                                                           |      |    |
|   |            |        |            | LZ4 Decompression Limitations                                               |      |    |
|   |            |        |            | Multi-frame decompression support                                           |      |    |
|   |            | 6.1.5  |            | ance Considerations                                                         |      |    |
|   |            |        |            |                                                                             |      |    |
|   |            | 6.1.6  | Flush Fla  |                                                                             |      |    |
|   |            | 6.1.7  |            | ms                                                                          |      |    |
|   |            | 6.1.8  |            | mpressed Data Block format                                                  |      |    |
|   |            | (10    |            | LZ4 Compression Support                                                     |      |    |
|   |            | 6.1.9  | •          | ss-and-Verify                                                               |      |    |
|   |            |        |            | Compress and Verify Error log in Sysfs                                      |      |    |
|   |            |        | 6.1.9.2    | Compress and Verify and Recover (CnVnR)                                     | <br> | 73 |

|     | 6.1.10     | Dynami               | c Compression                                                 | 74 |
|-----|------------|----------------------|---------------------------------------------------------------|----|
|     | 6.1.11     |                      | Im Expansion with Auto Select Best Feature (ASB)              |    |
|     | 6.1.12     |                      | Im Compression Expansion                                      |    |
|     | 6.1.13     | No Sess              | sion API                                                      | 76 |
|     | 6.1.14     | Compre               | ession Levels                                                 |    |
|     | 6.1.15     | Compre               | ession Status Codes                                           |    |
|     | 6.1.16     | Intel <sup>®</sup> Q | uickAssist Technology Compression API Errors                  | 78 |
|     |            | 6.1.16.1             | Compression API Errors                                        | 78 |
|     | 6.1.17     | Overflov             | ws Errors                                                     | 83 |
|     |            | 6.1.17.1             | Traditional API Overflow Exception                            | 84 |
|     |            | 6.1.17.2             | Data Plane API Overflow Error                                 | 85 |
|     |            | 6.1.17.3             | Handling Overflow Errors                                      |    |
|     |            | 6.1.17.4             | Compression Overflows in a Virtual Environment                |    |
|     |            | 6.1.17.5             | Avoiding Compression Overflow Exceptions                      |    |
|     | 6.1.18     |                      | y Checksums                                                   |    |
|     |            | 6.1.18.1             | Verify HW Integrity CRC's                                     |    |
|     | 6.1.19     |                      | ompression Applications                                       |    |
|     |            | 6.1.19.1             | Compression for Storage                                       |    |
| ( ) | <b>•</b> • |                      | Data Deduplication and WAN Acceleration                       |    |
| 6.2 | <i>,</i> , |                      | Services                                                      |    |
|     | 6.2.1      | 6.2.1.1              |                                                               |    |
|     |            | 6.2.1.1<br>6.2.1.2   | Supported Cipher Algorithms                                   |    |
|     |            | 6.2.1.2<br>6.2.1.3   | Supported Hash/Authenticate Algorithms                        |    |
|     | 6.2.2      |                      | graphy Applications                                           |    |
|     | 0.2.2      | 6.2.2.1              | IPsec and SSL VPNs                                            |    |
|     |            | 6.2.2.2              | Encrypted Storage                                             |    |
|     |            | 6.2.2.3              | Web Proxy Appliances                                          |    |
|     |            | 0121210              |                                                               |    |
| Sup | portec     |                      |                                                               | 94 |
| 7.1 |            | -                    | ist Technology APIs                                           |    |
|     | 7.1.1      |                      | graphic and Data Compression API Descriptions                 |    |
|     |            | 7.1.1.1              | Data Plane APIs Overview                                      |    |
|     |            | 7.1.1.2              | IA Cycle Count Reduction When Using Data Plane APIs $\ . \ .$ |    |
|     |            | 7.1.1.3              | Usage Constraints on the Data Plane APIs                      |    |
|     | 7.1.2      | -                    | AT API Limitations                                            |    |
| 7.2 |            | ional API:           |                                                               |    |
|     | 7.2.1      | ,                    | c Instance Allocation Functions                               |    |
|     |            | 7.2.1.1              | icp_sal_userCyGetAvailableNumDynInstances                     |    |
|     |            | 7.2.1.2              | icp_sal_userDcGetAvailableNumDynInstances                     |    |
|     |            | 7.2.1.3              |                                                               |    |
|     |            | 7.2.1.4<br>7.2.1.5   | <pre>icp_sal_userDcInstancesAlloc</pre>                       |    |
|     |            | 7.2.1.5<br>7.2.1.6   | icp_sal_userCyFreeInstances                                   |    |
|     |            | 7.2.1.0              | icp_sal_userCyGetAvailableNumDynInstancesByDevPkg             |    |
|     |            | 7.2.1.7              | icp_sal_userCyGetAvailableNumDynInstancesByDevPkg             |    |
|     |            | 7.2.1.8              | icp_sal_userCyInstancesAllocByDevPkg                          |    |
|     |            | 7.2.1.7              | icp_sal_userDcInstancesAllocByDevPkg                          |    |
|     |            |                      |                                                               |    |

|   |            | 7.2.1.11 icp_sal_userCyGetAvailableNumDynInstancesByPkgAccel | 106        |
|---|------------|--------------------------------------------------------------|------------|
|   |            | 7.2.1.12 icp_sal_userCyInstancesAllocByPkgAccel              | 106        |
|   | 7.2.2      | IOMMU Remapping Functions                                    | 107        |
|   |            | 7.2.2.1 icp_sal_iommu_get_remap_size                         |            |
|   |            | 7.2.2.2 icp_sal_iommu_map                                    |            |
|   |            | 7.2.2.3 icp_sal_iommu_unmap                                  |            |
|   |            | 7.2.2.4 IOMMU Remapping Function Usage                       |            |
|   | 7.2.3      | Polling Functions                                            |            |
|   |            | 7.2.3.1 icp_sal_pollBank                                     | 110        |
|   |            | 7.2.3.2 icp_sal_pollAllBanks                                 |            |
|   |            | 7.2.3.3 icp_sal_CyPollInstance                               |            |
|   |            | 7.2.3.4 icp_sal_DcPollInstance                               |            |
|   |            | 7.2.3.5 icp_sal_CyPollDpInstance                             |            |
|   |            | 7.2.3.6 icp_sal_DcPollDpInstance                             |            |
|   | 7.2.4      | User Space Access Configuration Functions                    |            |
|   | , <u> </u> | 7.2.4.1 icp_sal_userStart                                    |            |
|   |            | 7.2.4.2 icp_sal_userStop                                     |            |
|   | 7.2.5      | Version Information Function                                 |            |
|   | 7.2.0      | 7.2.5.1 icp_sal_getDevVersionInfo                            |            |
|   | 7.2.6      | Reset Device Function                                        |            |
|   | 7.2.0      | 7.2.6.1 icp_sal_reset_device                                 |            |
|   | 7.2.7      | Thread-Less APIs                                             |            |
|   | 7.2.7      | 7.2.7.1 icp_sal_poll_device_events                           |            |
|   |            | 7.2.7.2 icp_sal_find_new_devices                             |            |
|   | 7.2.8      | Compress and Verify (CnV) Related APIs                       |            |
|   | 7.2.0      | 7.2.8.1 icp_sal_get_dc_error                                 |            |
|   |            | 7.2.8.2 icp_sal_dc_simulate_error                            |            |
|   | 7.2.9      | Heartbeat APIs                                               |            |
|   | 1.2.7      | 7.2.9.1 icp_sal_check_device                                 |            |
|   |            | 7.2.9.1 icp_sal_check_all_devices                            |            |
|   |            | 7.2.9.3 icp_sal_heartbeat_simulate_failure                   |            |
|   | 7.2.10     |                                                              |            |
|   | 7.2.10     |                                                              |            |
|   |            | 7.2.10.1 icp_sal_poll_device_events                          |            |
|   |            | 7.2.10.2 cpaCyInstanceSetNotificationCb                      |            |
|   | 7011       | 7.2.10.3 cpaDcInstanceSetNotificationCb                      |            |
|   | 7.2.11     |                                                              |            |
|   |            | 7.2.11.1 icp_sal_SymGetInflightRequests                      |            |
|   |            | 7.2.11.2 icp_sal_AsymGetInflightRequests                     |            |
|   | 7010       | 7.2.11.3 icp_sal_dp_SymGetInflightRequests                   |            |
|   | 7.2.12     | Service Specific Polling APIs                                |            |
|   |            | 7.2.12.1 icp_sal_CyPollSymRing                               |            |
|   |            | 7.2.12.2 icp_sal_CyPollAsymRing                              |            |
|   | 7.2.13     | Check Device Availability APIs                               |            |
|   |            | 7.2.13.1 icp_sal_userlsQatAvailable                          | 126        |
| 0 | \/:        | ion                                                          | 100        |
| ð |            |                                                              | <b>128</b> |
|   |            | alization Deployment Model for Intel® QAT 2.0                |            |
|   | 8.2 Physic | cal Device Direct Assignment                                 |            |

|   | 8.3<br>8.4 |         | e Root IOV (SR-IOV)                                |    |
|---|------------|---------|----------------------------------------------------|----|
| 9 | Sec        | ure Arc | chitecture Considerations 13                       | 32 |
|   | 9.1        | Termi   | nology                                             | 32 |
|   |            | 9.1.1   | Threat Categories                                  |    |
|   |            | 9.1.2   | Attack Mechanism                                   |    |
|   |            | 9.1.3   | Attacker Privilege                                 | 33 |
|   |            | 9.1.4   | Deployment Models                                  |    |
|   | 9.2        | Threa   | t/Attack Vectors                                   |    |
|   |            | 9.2.1   | General Mitigation                                 |    |
|   |            | 9.2.2   | General Threats                                    |    |
|   |            |         | 9.2.2.1 DMA                                        |    |
|   |            |         | 9.2.2.2 Intentional Modification of IA Driver      |    |
|   |            |         | 9.2.2.3 Modification of the QAT Configuration File |    |
|   |            |         | 9.2.2.4 Malicious Application Code                 |    |
|   |            |         | 9.2.2.5 Denial of Service                          |    |
|   |            | 9.2.3   | Threats Specific to Cryptographic Service          |    |
|   |            |         | 9.2.3.1 Reading Cryptographic Keys                 |    |
|   |            |         |                                                    |    |

#### 10 Revision History

138

# **List of Tables**

| 1                                                                                    |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    | 2                                                              |
|--------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------|
| 2<br>3<br>4<br>5<br>6<br>7<br>8<br>9<br>10<br>11<br>12<br>13<br>14<br>15<br>16<br>17 | Pros and cons of modes of operation                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                | 19<br>19<br>20<br>22<br>31<br>32<br>35<br>35<br>36<br>37<br>38 |
| 18<br>19<br>20                                                                       | Acceleration Driver Services                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       | 44                                                             |
| 21<br>22<br>23<br>24<br>25<br>26<br>27<br>28                                         | General Section ParametersGeneral Default Configuration Parameters[KERNEL] Section Parameters[User Process] Section Parameters[User Process] Section ParametersGeneral Default Logical Instance ParametersData Compression Logical Instance ParametersGeneral Default Logical Instance ParametersConfiguration VariationsGeneral Default Configuring Physical Functions and Virtual Functions                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      | 51<br>54<br>55<br>56<br>56<br>58                               |
| 29<br>30<br>31<br>32<br>33                                                           | Compression CpaDcSessionSetupData Properties       Compression CpaDcSessionSetupData Properties       Compression CpaDcSessionSetupData Properties         Flush Flags       Compression CpaDcSessionSetupData Properties       Compression CpaDcSessionSetupData Properties         Checksums       Compression CpaDcSessionSetupData Properties       Compression CpaDcSessionSetupData Properties         Differences between LZ4 and LZ4s block format       CompressionSetupData Properties       CompressionSetupData Properties | 67<br>69<br>70                                                 |

| 34 | Compress and Verify and Recover (CnVnR) Behaviors | 73  |
|----|---------------------------------------------------|-----|
| 35 | ASB Settings                                      | 75  |
| 36 | Compression Levels                                | 77  |
| 37 | Compression API Errors                            | 78  |
| 38 | Overflows Errors                                  | 84  |
| 39 | Integrity Checksums                               | 86  |
| 40 | Supported Cipher Algorithms                       | 89  |
| 41 | Supported Hash/Authenticate Algorithms            | 90  |
| 42 | Supported Public Key Algorithms                   | 91  |
| 43 | Key Generation Cryptographic API Limitations      | 97  |
| 44 | Threat Categories                                 | 132 |
| 45 | Attack Mechanism                                  |     |
| 46 | Attacker Privilege                                |     |
| 47 | Deployment Models                                 | 134 |

# **1 About this Document**

This programmer's guide provides information on the architecture of the software and usage guidelines.

Information on the use of Intel<sup>®</sup> QuickAssist Technology (Intel<sup>®</sup> QAT) APIs, which provide the interface to the acceleration services (cryptographic and data compression), is documented in the related Intel<sup>®</sup> QAT software library documentation referenced in the Release Notes.

In this document, for convenience:

- *Software package* is used as a generic term for the Intel® QAT Software Package for Hardware Version 2.0.
- Acceleration driver is used as a generic term for the software that allows the Intel® QAT Software Library APIs to access the Intel® QAT Endpoint(s).

Note: Refer to the Release Notes for a list of supported platforms.

**Note:** Current version of this document covers the out-of-tree acceleration driver. Future version of this document will be updated to cover in-tree driver as well.

For additional details on in-tree driver refer to https://github.com/intel/qatlib.

# 1.1 Conventions and Terminology

The following conventions are used in this manual:

- Code text code examples, command line entries, Application Programming Interface (API) names, parameters, filenames, directory paths, and executables.
- Bold text graphical user interface entries, buttons, and actions in instructions.
- Italic text key terms and publication titles.

The following terms and acronyms are used in this manual.

#### Table 1: Terminology

| Term                   | Description                               |
|------------------------|-------------------------------------------|
| API                    | Application Programming Interface         |
| asym                   | Asymmetric Cryptography                   |
| BDF                    | Bus Device Function                       |
| BOM                    | Bill of Materials                         |
| CBC                    | Cipher Block Chaining                     |
| су                     | Cryptography                              |
| dc                     | Data Compression                          |
| GRUB                   | Grand Unified Bootloader                  |
| OS                     | Operating System                          |
| PCI                    | Peripheral Component Interconnect         |
| PF                     | PCIe Physical Function                    |
| Intel <sup>®</sup> QAT | Intel <sup>®</sup> QuickAssist Technology |
| SKU                    | Stock Keeping Unit                        |
| SR-IOV                 | Single Root-I/O Virtualization            |
| VF                     | Virtual Function                          |

# 2 Architecture

Because the hardware is accessed via the Intel® QAT APIs, it is not necessary to know all of the hardware and software architecture details, but some knowledge of the underlying hardware and software is helpful for performance optimization and debug purposes.





The flow can be broken down as:

- 1. Application submits payloads via the Intel® QuickAssist API as part of the request. The user space library converts these requests into descriptors and places these in the Transmit (Tx) *hardware-assisted queues* (aka ring).
- 2. Firmware parses the descriptor and configures the accelerators accordingly. Upon a job completion firmware returns the processed payload (either encrypted or compressed or both) and generates a response message. This response message is inserted in the response ring.
- 3. A polling thread owned by the application queries the response ring via the Intel<sup>®</sup> Quick-Assist Library. If the application chooses non-blocking calls the user space library will issue a callback to the application to inform that the operation is complete.

**Note:** The UIO (to be replaced with VFIO) layer is a framework present in both Linux Kernel and user space library libudev. This framework enables exchanging data between Kernel and user space. It offers better latency performance than IOCTL.

# **3** Infrastructure

The following sections describe the building blocks of the Intel® QAT Endpoints' architecture.

# 3.1 Queues and Queue Pairs

Communication between CPU and Intel<sup>®</sup> QuickAssist Technology hardware is via hardware-assisted queues (aka rings):

- Queues are circular buffers.
- Memory is in System DRAM.
- Device is configured with base address, entry size and number of entries via device CSRs.
- Head and Tail pointers are in device CSRs (MMIO space).

### 3.1.1 Queues Pairs

- To send a request, software writes request descriptor to next available entry in the request queue, and updates the tail pointer.
- Device firmware reads request descriptor from request queue, updating the head pointer. It then
  processes the request, writes response descriptor onto response queue, and updates the tail
  pointer.
- Response queues can be configured to generate an interrupt when device firmware updates the tail pointer, or can be polled.



## 3.1.2 Queue Bundles

Queues are grouped into bundles of 8 queues (4 Queue Pairs (QPs)).

• When SR-IOV is enabled, each bundle shows up as a separate Virtual Function.

Within each bundle, by default, a separate QP is used for each of the three possible services:

- 1. Public Key Crypto
- 2. Symmetric Crypto
- 3. Data Compression
- Public Key Crypto
- Symmetric Crypto
- Data Compression

|--|

Max of 2 service types per QAT device at a time. Each QP can be allocated to a specific service, in a bare metal environment.

# 3.2 Service Instances

At the Intel<sup>®</sup> QuickAssist Technology API, we abstract queue pairs using the concept of service instances.

- To use a service, an application must first get a handle to a service instance.
- Corresponds to one or more queue pairs:
  - Data compression instance contains I queue pair.
  - Cryptographic instance:
    - \* QAT Gen2: contains 2 queue pairs, one for each sub-service of crypto (symmetric crypto, public key crypto).
    - \* QAT Gen4: crypto instances can be specified as either sym (symmetric) or asym (asymmetric) cryptography and contain I queue pair.

# 3.2.1 Configurable Items (via config file)

- Queue depth (for each queue).
- Number of service instances per process for a given device (limited by available rings), for example:
  - One per address space (e.g. user space processes).
  - One per software or hardware thread (logical core), to avoid contention.
- Number of queue pairs per service, per bundle/VF, will be configurable in future.

# 3.3 Memory Management

This section describes memory management requirements for submitting buffers to the QAT hardware.

### 3.3.1 Shared Virtual Memory

Shared Virtual Memory (SVM) is a new feature in QAT 2.0 hardware. In QAT 1.x hardware, memory needs to be submitted to the hardware as pinned and physically contiguous memory. In QAT 2.0, SVM allows direct submission of an applications buffer, thus removing the memcpy cycle cost, cache thrashing, and memory bandwidth. The SVM feature enables passing virtual addresses to the QAT hardware for processing acceleration requests.

With SVM:

- Virtually contiguous (can also deal with Scatter Gather Lists of virtually addressed buffers).
- Virtually addressed.

• Can tolerate page faults but Pinning (i.e. locked, guaranteed resident in physical memory) is recommended for performance.

#### 3.3.1.1 SVM Kernel Requirements

In order to use SVM, ensure that kernel version v6.1 or higher is used. Alternatively verify the following kernel patches are applied.

- 81c95fbaebfa5990c3c786c8c3e87426a33106fe
- e65a6897be5e4939d477c4969a05e12d90b08409

Verification can be done with the following steps:

git tag --contains 81c95fbaebfa5990c3c786c8c3e87426a33106fe git tag --contains e65a6897be5e4939d477c4969a05e12d90b08409

This requirement provides mitigation for the issue QAT20-23616 described in the Release Notes.

The following kernel boot parameters need to be defined in order to utilize SVM.

```
intel_iommu=on,sm_on
```

Refer to *Shared Virtual Memory Parameters* for details on QAT configuration files updates required to support SVM.

### 3.3.2 DMA-able Memory

If SVM is not enabled, Memory passed to Intel<sup>®</sup> QuickAssist Technology hardware must be DMA'able.

- Physically contiguous (can also deal with Scatter Gather Lists).
- Physically addressed.
  - If VT-d is enabled (e.g. in virtualized system), then Intel IOMMU will translate to host physical addresses as needed.
- Pinned (i.e. locked, guaranteed resident in physical memory).

Intel provides a User Space DMA-able Memory (USDM) component (kernel driver and corresponding user space library) which allocates/frees DMA-able memory, mapped to user space, performs virtual to physical address translation on memory allocated by this library

This component is used by the sample code supplied with the user space library.

## 3.3.3 Memory Type Determination

QAT 2.0 hardware offers the application to use virtual memory directly to sending the acceleration requests and saving the memory copy overhead. However, different SVM configurations will result in different memory types. The QAT package offers memory management library called User Space DMAable Memory(USDM) to help user space applications using the pinned memory.

| SVMEnabled | ATEnabled | Memory Type                            |
|------------|-----------|----------------------------------------|
| FALSE(0)   | FALSE(0)  | Pinned Memory (USDM)                   |
| TRUE(1)    | FALSE(0)  | Pinned Memory (USDM)                   |
| FALSE(0)   | TRUE(1)   | Invalid configuration                  |
| TRUE(1)    | TRUE(1)   | Pinned Memory (USDM) or Dynamic Memory |
|            |           | (malloc/ zalloc/mmap)                  |

## 3.3.4 Buffer Formats

Data buffers are passed across the API interface in one of the following formats:

- Flat Buffers represent a single region of physically contiguous memory.
- Scatter-Gather Lists (SGL) are essentially an array of flat buffers, for cases where the memory is not all physically contiguous.

### 3.3.4.1 Flat Buffers

Flat buffers are represented by the type CpaFlatBuffer, defined in the file cpa.h. It consists of two fields:

- Data pointer **pData**: points to the start address of the data or payload. The data pointer is a virtual address; however, the actual data pointed to is required to be in contiguous and DMAable physical memory. This buffer type is typically used when simple, unchained buffers are needed.
- Length of this buffer: dataLenInBytes specified in bytes.

For data plane APIs (cpa\_sym\_dp.h and cpa\_dc\_dp.h), a flat buffer is represented by the type CpaPhys-FlatBuffer, also defined in cpa.h. This is similar to the CpaFlatBuffer structure; the difference is that, in this case, the data pointer, bufferPhysAddr, is a physical address rather than a virtual address.

#### 3.3.4.2 Scatter-Gather List (SGL) Buffers

A scatter-gather list is defined by the type **CpaBufferList**, also defined in the file **cpa.h**. This buffer structure is typically used where more than one flat buffer can be provided to a particular API. The buffer list contains four fields, as follows:

- The number of buffers in the list.
- pBuffers: pointer to an unbounded array of flat buffers.

# intel



- UserData: an opaque field; is not read or modified internally by the API. This field could be used to provide a pointer back into an application data structure, providing the context of the call.
- pMetaData: pointer to metadata required by the API:
  - The metadata is required for internal use by the API. The memory for this buffer needs to be allocated by the client as contiguous data. The size of this metadata buffer is obtained by calling cpaCyBufferListGetMetaSize for crypto, cpaBufferLists, and cpaDcBufferListGetMeta-Size for data compression.
  - The memory required to hold the **CpaBufferList** structure and the array of flat buffers is not required to be physically contiguous. However, the flat buffer data pointers and the metadata pointer are required to reference physically contiguous DMAable memory.
  - There is a performance impact when using scatter-gather lists instead of flat buffers. Refer to the Performance Optimization Guide for additional information.
  - Scatter-Gather list (SGL) buffers should not have more than 256 entries.

For data plane APIs (cpa\_sym\_dp.h and cpa\_dc\_dp.h) a region of memory that is not physically contiguous is described using the CpaPhysBufferList structure. This is similar to the CpaBufferList structure; the difference, in this case, the individual flat buffers are represented using physical rather than virtual addresses.

# 3.3.5 Huge Pages

The included User space DMAable Memory driver **usdm\_drv.ko** supports 2MB pages. This allows direct access to main memory by devices other than the CPU and the actual supported maximum memory size in one individual allocation when huge pages is enabled is 2MB - 5KB. Where the 5KB is used for memory management for the memory driver. The use of 2MB pages provides benefits, but also requires additional configuration. Use of this capability assumes that a sufficient number of huge pages are allocated in the operating system for the particular use case and configuration.

Here are some example use cases:

• Default settings applied:



modprobe usdm\_drv.ko

• Set maximum amount of Non-uniform Memory Access (NUMA) type memory that the User Space DMAable Memory (USDM) driver can allocate to 32MB for all processes. Huge pages are disabled:

modprobe usdm\_drv.ko max\_mem\_numa=32768

• Set maximum number of huge pages that the USDM can allocate to 50 in total and 5 per process:

modprobe usdm\_drv.ko max\_huge\_pages=50 max\_huge\_pages\_per\_process=5

Note: This configuration works for up to the first 10 processes.

Here are examples of invalid use cases to avoid:

• This is erroneous configuration, maximum number of huge pages that USDM can allocate is 3 totals: 3 for a first process, 0 for the next processes:

insmod ./usdm\_drv.ko max\_huge\_pages=3 max\_huge\_pages\_per\_process=5

• This command results in huge pages being disabled because max\_huge\_pages is 0 by default:

insmod ./usdm\_drv.ko max\_huge\_pages\_per\_process=5

 This command results in huge pages being disabled because max\_huge\_pages\_per\_process is 0 by default: insmod ./usdm\_drv.ko max\_huge\_pages=5

**Note:** The use of huge pages may not be supported for all use cases. For instance, depending on the driver version, some limitations may exist for an Input/Output Memory Management Unit (IOMMU).

# 3.4 Modes of Operation

## 3.4.1 Calling Semantics

#### 3.4.1.1 Asynchronous (Polled)

Hardware "request/response" interface is inherently asynchronous (non-blocking).

- Calling function returns once request submitted.
- Callback invoked when response available (polled).

#### 3.4.1.2 Asynchronous (Interrupts)

Hardware "request/response" interface is inherently asynchronous (non-blocking).

- Calling function returns once request submitted.
- Callback invoked when response available (interrupt-driven).

#### 3.4.1.3 Synchronous

Software interface is traditionally synchronous (blocking).

- Calling function blocks until response available.
- Can be implemented "on top of" asynchronous hardware semantics.

#### 3.4.1.4 Pros And Cons







intel.

|                             | Asynchronous                                                                                                       | Synchronous                                                                                                                  |
|-----------------------------|--------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------|
| CPU Utilization             | Software thread can do other things<br>while hardware processes job, without<br>need for expensive context switch. | Software thread blocked or idle<br>awaiting response. Can use<br>multi-threading, but context switching<br>can be expensive. |
| Acceleration<br>Utilization | A single software thread can have<br>multiple requests outstanding,<br>keeping multiple accelerator engines.       | Hardware has at most one request<br>outstanding per CPU/software thread,<br>remaining threads are idle.                      |
| Ease of Use                 | Can be difficult if application is designed to use synchronous APIs.                                               | Easier to integrate if application is designed to use synchronous APIs.                                                      |

Table 2: Pros and cons of modes of operation

Note: Asynchronous API tends to be optimal for performance, but harder to integrate.

# 3.5 Load Balancing

# 3.5.1 Per Endpoint

There are four arbiters, which by default are used for the different services (with one spare/unused). Each partition:

- Arbitrates across two request queues per bundle/VF, to pick a request.
- Load balances all of these requests across all available "engines".

Within a partition, arbitration uses round robin.

• Ensures fairness (in terms of number of requests) across queue pairs and guests



# 3.5.2 Across Endpoints

In a platform or CPU with multiple Intel® QAT devices, software is responsible for load sharing across devices/endpoints. Sapphire Rapids has up to four Intel® QAT devices/endpoints in a single CPU package (depending on SKU) PCIe card may have multiple (QAT 1.x) devices across one or more chipsets.

Software-based Load Sharing can be implemented at various layers:

- For applications using the Intel<sup>®</sup> QuickAssist Technology API, the application must implement load balancing.
- For applications using a framework (e.g. OpenSSL), the framework implements load balancing.

#### 3.5.2.1 Load Sharing Criteria

- Simple round-robin scheme recommended.
- May want to consider "locality" in a multi-socket (NUMA) platform.



## 3.5.3 Dimensions

Table 3: Dimensions Gen 1 & Gen 2

|                                        | Gen 1                                                  | Gen 2               |                                |
|----------------------------------------|--------------------------------------------------------|---------------------|--------------------------------|
|                                        | Intel® Communication<br>Chipset 8925 to 8955<br>Series | Intel® C62x Chipset | Intel Atom® Processor<br>C3000 |
| Number PCIe End-<br>points             | 1                                                      | 3                   | 1                              |
| Number of Bundles<br>/VFs per Endpoint | 32                                                     | 16                  | 16                             |
| Number of Queue Pairs<br>per Bundle    | 8                                                      |                     |                                |

Table 4: Dimensions Gen 3 & Gen 4

|                       | Gen 3                            | Gen 4                                                                 |
|-----------------------|----------------------------------|-----------------------------------------------------------------------|
|                       | Intel® Atom P5000 Processor/ Ice | Intel <sup>®</sup> 4th Gen Intel <sup>®</sup> Xeon <sup>®</sup> Scal- |
|                       | Lake D                           | able Processor (per socket)                                           |
| Number PCIe End-      | 1                                | 4                                                                     |
| points                |                                  |                                                                       |
| Number of VFs per     | 128                              | 16                                                                    |
| Endpoint              |                                  |                                                                       |
| Number of Queue Pairs | 8                                | 4                                                                     |
| per Bundle            |                                  |                                                                       |

# 3.6 Debugability

## 3.6.1 Overview of Intel<sup>®</sup> QAT debugfs entries

Some useful debugging information for the driver and configuration is available via the Linux debugfs file system, with the entries /sys/kernel/debug/qat\_\* and /sys/kernel/debug/qae\_mem\_dbg/ qae\_mem\_slabs.

### 3.6.1.1 Entries in /sys/kernel/debug/qat\_\*

This includes:

| Table 5: Intel® QuickAssist Technology /sys/kernel/debug |
|----------------------------------------------------------|
| Entries                                                  |

| Entry        |        | Description                                          | Supported | Plat- |
|--------------|--------|------------------------------------------------------|-----------|-------|
|              |        |                                                      | forms     |       |
| cnv_errors   |        | Indicates number of <i>compressAndVerify</i> errors. | All       |       |
|              |        | Refer to Compress and Verify Error log in Sysfs.     |           |       |
| dev_cfg      |        | Displays internal device configuration informa-      | All       |       |
|              |        | tion.                                                |           |       |
| frequency    |        | Displays frequency of Acceleration Engines.          | All       |       |
| fw_counters  |        | Displays Acceleration Engine firmware re-            | All       |       |
|              |        | quests/responses.                                    |           |       |
| heartbeat,   | heart- | Refer to System Virtual Files.                       | All       |       |
| beat_failed, | heart- |                                                      |           |       |
| beat_sent    |        |                                                      |           |       |
| pm_status    |        | Displays power management status. Refer to           | QAT 2.0   |       |
|              |        | Power Management for additional information.         |           |       |
| transport    |        | Contains firmware request/response data. Avail-      | All       |       |
|              |        | able only for kernel space instances.                |           |       |
| version      |        | Includes package version information.                | All       |       |

#### 3.6.1.2 Memory driver queries (qae\_mem\_slabs)

Debug features are also available by reading and writing the file /sys/kernel/debug/qae\_mem\_dbg/ qae\_mem\_slabs. When *reading* the virtual/physical address, size and slab id together with the pid of the allocating process are shown. *Writing* a string to the file will start executing debug commands.

For example:

```
# cat /sys/kernel/debug/qae_mem_dbg/qae_mem_slabs
Pid 78854, Slab Id 10550771712
Virtual address 00000000b39412d, Physical Address 274e00000, Size 2097152
Pid 78854, Slab Id 10309599232
Virtual address 000000003670dd45, Physical Address 266800000, Size 2097152
...
```

There are three commands supported, and the below table shows their output:

| Writing these strings                                          | will output this when the file is read           |  |
|----------------------------------------------------------------|--------------------------------------------------|--|
| d <pid> <virtual address="" or="" physical=""></virtual></pid> | The 256 byte in hex and ascii from the start ad- |  |
|                                                                | dress                                            |  |

Table 6: Read/Write to /sys/kernel/debug/qae\_mem\_dbg/qae\_mem\_slabs

continues on next page

| Table 6 – continued from previous page                            |                                                      |  |  |
|-------------------------------------------------------------------|------------------------------------------------------|--|--|
| Writing these strings                                             | will output this when the file is read               |  |  |
| "c <pid> <slab id="">" (pid should be the process id</slab></pid> | The allocation bit map for the given slab identifier |  |  |
| that can be obtained by a previous read)                          |                                                      |  |  |
| "t"                                                               | Total size of NUMA memory allocated in kernel        |  |  |
|                                                                   | space                                                |  |  |

For example, by combining a write to the file and a subsequent read, you can see the total allocated NUMA memory, e.g.:

```
# echo "t" > /sys/kernel/debug/qae_mem_dbg/qae_mem_slabs ; cat /sys/kernel/debug/qae_mem_dbg/
→qae_mem_slabs
Total allocated NUMA memory: 142606336 bytes
```

As above, the "d" and "c" commands will output their respective information.

# 3.7 Heartbeat

Under some circumstances, firmware in the Intel® QAT devices could become unresponsive, requiring a device reset to recover. The Intel® QAT Heartbeat feature provides a mechanism for the customer application to detect and reset unresponsive devices. It also notifies the application processes of the start and end of the reset operation and suspends all Intel® QAT instances between the events.

### 3.7.1 Heartbeat Operation

A Heartbeat-enabled Intel<sup>®</sup> QAT device firmware periodically writes counters to a specified physical memory location. A pair of counters per thread is incremented at the start and end of the main processing loop within the firmware. Checking for Heartbeat consists of checking the validity of the pair of counter values for each thread. Stagnant counters indicate a firmware hang.

#### 3.7.1.1 Initialization

At startup, the Intel<sup>®</sup> QAT device driver allocates memory for the counter pairs to be written by the firmware and then sends a message to the firmware to start the Heartbeat functionality.

#### 3.7.1.2 Heartbeat Monitoring

Heartbeat check/monitoring refers to invocation of one of the two API calls that checks if the device is responsive. Heartbeat failure refers the API returning failure.

The Intel<sup>®</sup> QAT driver does not monitor for Heartbeat. It should be initiated by a Heartbeat management thread calling one of the following APIs periodically:

- icp\_sal\_check\_device(Cpa32U accelId)
- icp\_sal\_check\_all\_devices(void)

A failure return code implies the device has failed or hung.

The Heartbeat management thread should satisfy the following conditions:

- For any given device, only one such process/thread should monitor.
- One process can monitor one or more devices.
- It can be a user application that uses Intel<sup>®</sup> QAT services, or a separate management/control plane process.
- In virtualized environment, monitoring process(es)/thread(s) must run in the context of the host or hypervisor.

#### 3.7.1.3 Resetting a Failed Device

A device can be configured for automatic reset by the Intel<sup>®</sup> QAT framework or manually reset by the application by using the AutoResetOnError field in the device configuration file /etc/<device>.conf, as shown below.

| AutoResetOnError Value | Action on Heartbeat Failure    |
|------------------------|--------------------------------|
| O (default)            | Do not reset the device        |
| 1                      | Reset the device automatically |

If an Intel<sup>®</sup> QAT device is not configured for automatic reset, the management thread should reset it using the **icp\_sal\_reset\_device(Cpa32U accelId)** API.

The icp\_sal\_reset\_device() function starts an asynchronous reset sequence and returns immediately. The reset function should not be called again until the device has completed the reset to avoid a reset storm. The icp\_sal\_check\_device(<device id>) function could be called in a loop to check if the device reset is still in progress.

If the application devices are all configured for automatic reset then the icp\_sal\_check\_all\_devices() function could be used; otherwise, the function should not be used because it does not return the identity of the failed device, which is a required parameter for the icp\_sal\_reset\_device() function.

#### **Function Signatures**

The details of the above functions, parameters, and return values can be found in *Supported APIs > Additional APIs*.

## 3.7.2 Incorporating Heartbeat into Intel<sup>®</sup> QAT Applications

A typical Intel<sup>®</sup> QAT user application consists of two tasks:

- The first task is typically an application thread that initializes Intel<sup>®</sup> QAT instances and sessions, and then submits service requests for Intel<sup>®</sup> QAT crypto or compression.
- If an application employs polling to receive Intel<sup>®</sup> QAT service responses, then this task is also an application thread. Alternatively, responses are received as an interrupt handler.

Two more tasks are required to support Heartbeat:

- The first is a management task to monitor the devices for failure or hang and then resets them, when required. As discussed earlier, this could be an application thread of an independent management process.
- The second task is an application thread that polls for device reset events:
  - Device is restarting: CPA\_INSTANCE\_EVENT\_RESTARTING
  - Device restart is complete: CPA\_INSTANCE\_EVENT\_RESTARTED

If the application employs polling to receive Intel<sup>®</sup> QAT service responses, then this task could be included in the same polling loop.

The polling for device events is done using the API: icp\_sal\_poll\_device\_events().

The two callback functions for crypto and compression are registered using the following APIs:

- cpaCyInstanceSetNotificationCb
- cpaDcInstanceSetNotificationCb

The details of the above functions, parameters, and return values can be found in *Supported APIs > Additional APIs*.

### 3.7.3 Restart Sequence

During the restart sequence, the user space library releases the memory used for rings and other data structures as part of the shutdown and reallocates them when the restart is completed. This is transparent to the user application, so it can continue to use the same logical instance after reset to submit Intel<sup>®</sup> QAT service requests. Any memory allocated by the user application for the Intel<sup>®</sup> QAT service is untouched during device reset.

A typical Heartbeat error use-case is as follows:

1. The driver and the firmware is loaded, initialized and started.



- 2. The user-space application registers to receive instance notifications by calling **cpaCyInstance-SetNotificationCb** and **cpaDcInstanceSetNotificationCb**.
- 3. The management thread monitors for the device's heartbeat. When a device is unresponsive, a device reset is initiated by this thread or by the Intel<sup>®</sup> QAT framework depending on the device configuration.
- 4. The kernel-space process sends the restarting event to the user-space process.
- 5. The user-space driver passes the device restarting event to all the registered application instances. It also frees memory and rings associated with the registered instances.
- 6. The kernel-space driver triggers the device reset.
- 7. During reset, the Intel<sup>®</sup> QAT service request made by the user application returns one of:
  - CPA\_STATUS\_FAIL
  - CPA\_STATUS\_RETRY
  - CPA\_STATUS\_RESTARTING
- 8. When the device reset is complete, the kernel-space driver sends a device restarted event to the user space driver.
- 9. The user space driver allocates the memory and rings and then forwards the device Restarted event to each of the registered instances.

# 3.7.4 Status of Packets in Flight (Crypto Applications Only)

When a device has fatal errors, the application ordinarily cannot determine whether or not inflight requests have been processed successfully.

The current Intel<sup>®</sup> QAT release includes a dummy response feature that creates mock responses to all requests submitted during a fatal error condition, so the application can detect them and, therefore, know which requests need to be resubmitted to the available devices or to the software.

**Note:** The sequence of dummy responses will match the sending request sequence for all requests submitted during a fatal error.

Since the dummy response feature only supports Public Key Encryption (PKE), dummy responses may be generated only when the **icp\_sal\_cyPollInstance()** function is called, since it is the function for crypto services.

The icp\_sal\_poll\_device\_events() function should also be called by the application, so that the application get a notification when the device encounters a failure and dummy responses are generated when calling icp\_sal\_CyPollInstance() for the inflight requests.

## 3.7.5 Determining Device ID

The *<device id>* that is passed as a parameter to several Heartbeat API is the numeric suffix of the device name displayed by the following command. (device name: *qat\_dev0*):

service qat\_service status

The output will look like:

```
There is 1 QAT acceleration device(s) in the system: qat_dev0 - type: c3xxx, inst_id: 0, node_

id: 0, bsf: 01:00.0, #accel: 3 #engines: 6 state: up
```

The Intel<sup>®</sup> QAT library has no API to discover the device number easily. However, an application can use the IOCTLs **IOCTL\_GET\_NUM\_DEVICES** and **IOCTL\_STATUS\_ACCEL\_DEV** to find the device\_id of a particular device if they know the Bus Device Function (BDF). Refer to **perform\_query\_dev()** in **./adf\_ct1.cpp**.

### 3.7.6 Testing Heartbeat

Two debug capabilities are available to assist the developers incorporating Heartbeat into their applications:

- Simulation of Heartbeat failure.
- System virtual files under /sys/kernel/debug/.

#### 3.7.6.1 Simulated Heartbeat Failure Configuration

The Heartbeat feature is always enabled in the package. However, a debug capability that simulates device failure can be enabled during the configure step as follows:

./configure --enable-icp-hb-fail-sim

#### 3.7.6.2 Simulating Heartbeat Failure

Simulating Heartbeat failure can be accomplished using two methods:

- Using the APlicp\_sal\_heartbeat\_simulate\_failure(<device id>).
- Executing the command:

cat /sys/kernel/debug/<device>/heartbeat\_sim\_fail

#### **System Virtual Files**

**Note:** The heartbeat /sys/kernel/debug files are associated with the QAT Physical Function (PF).

The Heartbeat feature implements the following system virtual files under the /sys/kernel/debug/qat\_<device>\_<your\_device\_BDF>/ directory. For additional details refer to debugfs-driver-gat documentation.

Table 8: Heartbeat System Virtual Files

| File             | Content                                                                  |
|------------------|--------------------------------------------------------------------------|
| heartbeat        | 0: Device is responsive1: Device is NOT responsive.                      |
| heartbeat_failed | Number of times the device became unresponsive.                          |
| heartbeat_sent   | Number of times the control process checked if the device is responsive. |

A developer could simulate the Heartbeat management process by running the following script in the background:

| #!/bin/bash                                                       |         |
|-------------------------------------------------------------------|---------|
| while : do                                                        |         |
| cat /sys/kernel/debug/ <device>/heartbeat &gt; /dev/null</device> | sleep 1 |
| done                                                              |         |

#### **Heartbeat Polling Frequencies**

The application developer should decide on the following two Heartbeat polling frequencies:

- Device Heartbeat monitoring.
- Checking for device reset events.

#### **Device Heartbeat Monitoring**

Consider the following points when determining the frequency of Heartbeat monitoring:

- Increasing Heartbeat monitoring frequency will minimize the customer's system downtime.
- However, since device unresponsiveness should be an infrequent event, high frequency Heartbeat monitoring wastes CPU cycles.
- Also, if there are large Intel<sup>®</sup> QAT service requests that take some time to complete, high frequency Heartbeat monitoring could result in false reports of unresponsiveness.
- With QAT Gen4 devices, heartbeat update timer in firmware is a constant value of 200ms (unconfigurable). With QAT Gen2 devices this value is configurable with configuration item **Heartbeat-Timer** (the default value is 500ms and the minimal allowed value is 200ms)

#### **Programmer's Guide**

• For both QAT Gen2 and Gen4 monitoring interval should be larger or equal than the Heartbeat update interval. (e.g. if user configure HeartbeatTimer=300, polling interval should be >=300ms)

#### **Checking for Device Reset Events**

If the application uses polling for reading Intel<sup>®</sup> QAT service responses, there is no value in checking for resets more frequently. Since device unresponsiveness is an infrequent occurrence, frequency of checking for reset events could be a fraction of the frequency of polling for Intel<sup>®</sup> QAT service responses.

## 3.7.7 Handling Device Failures in a Virtualized Environment

The Heartbeat feature in the acceleration software can be used in a virtualized environment. Refer to the Using Intel® Virtualization Technology (Intel® VT) with Intel® QuickAssist Technology Application Note for more details on enabling SR-IOV and the creation of Virtual Functions (VFs) from a single Intel® QuickAssist Technology acceleration device to support acceleration for multiple Virtual Machines (VMs).

The following sequence describes a possible use case for using the Heartbeat feature in a virtualized environment.

- 1. The Intel<sup>®</sup> QAT Physical Function driver (PF driver) isloaded, initialized and started.
- 2. The Intel<sup>®</sup> QAT Virtual Function driver (VF driver) is loaded, initialized and started in the Guest OS in the VM.
- 3. The PF driver detects that the firmware is unresponsive (using either of the following methods: User Proc Entry Read (not Enabled by Default) on page 47 or User Application Heartbeat APIs (not Enabled by Default) on page 48).
- 4. The PF driver sends the "Restarting" event message to the VF via the internal PF-to-VF communication messaging mechanism.
- 5. The VF driver sends the "Restarting" event to the application's registered callback. The callback is registered using either of the Intel<sup>®</sup> QAT API functions **cpaDcInstanceSetNotificationCb()** or **cpaCyInstanceSetNotificationCb()** in the Guest OS. (The application's callback function may perform any application-level cleanup.)
- 6. The PF driver starts the reset sequence (save state, initiate reset, and restore state).
- 7. The user restarts the Guest OS and loads the VF driver and application in the Guest OS.

#### Note:

- If the Heartbeat feature in the acceleration software is not enabled, the PF driver will not notify the VF driver that the firmware is unresponsive.
- The error detection mechanisms are not available on the VF driver in the VM, but device errors caused by any of the software running on the VM will be detected by the PF driver using the above mechanisms.

# 3.7.8 Incorporating Dummy Responses into an Intel<sup>®</sup> QAT Application

The dummy response feature has been incorporated in a scenario with the Intel<sup>®</sup> QAT engine and Nginx\*. Figure below illustrates how it works. This can be used as a reference to so-called "software fallback."

The Intel<sup>®</sup> QAT engine is a shim layer between OpenSSL\* libcrypto\* and Intel<sup>®</sup> QAT Library. The Intel<sup>®</sup> QAT Library will generate failover responses.

The Heartbeat Monitoring Daemon, a single process, is a daemon which is used to check the device status periodically and trigger the driver the reset the device when heartbeat failure happens. Its only activity is calling icp\_sal\_check\_device() or icp\_sal\_check\_all\_devices() periodically.

The Intel<sup>®</sup> QAT Engine polls for and handles "device error" and "device ok" events (via udev). It keeps track of the number of devices which are active.

- If some, but not all, Intel<sup>®</sup> QAT devices encounter errors, switch to remaining available devices by resubmitting the inflight requests, which are responded to with dummy responses and new requests to the available devices.
- If the number of active Intel<sup>®</sup> QAT devices goes to zero, switch to software and resubmit the inflight requests which are responded to with dummy responses and new requests to the software.
- If the number of active Intel<sup>®</sup> QAT devices goes positive again, switch back to hardware.

# 3.8 Telemetry

The telemetry feature is a tool to view the performance and utilization of an acceleration device. Telemetry data can be viewed on a per device and a per ring pair (also known as *queue pair*) basis.

**Note:** There are differences between the implementations of Telemetry with the in-tree acceleration driver and the out-of-tree acceleration driver.

Details for each are included in the sections below.

# 3.8.1 Telemetry Usage

### 3.8.1.1 Out-Of-Tree

The telemetry feature is configured and queried using **sysfs** files in the Linux filesystem.

The telemetry **sysfs** folder is located at /**sys/devices/pciAAAA:BB/AAAA:BB:CC.D/telemetry** 

Where:

• AAAA:BB:CC.D is the *Domain:BDF* of the target Intel® QAT Endpoint.





#### Example:

ls /sys/devices/pciAAAA:BB/0000:6b:00.0/telemetry

### 3.8.1.2 In-Tree

Important: Refer to Release Notes -In-Tree for kernel version requirements for enabling this feature.

The telemetry feature is configured and queried using **debugfs** files in the Linux filesystem.

The telemetry **debugfs** folder is located at /sys/kernel/debug/qat\_4xxx\_\*/telemetry

Where:

• qat\_4xxx\_\* is the *Domain:BDF* of the target Intel<sup>®</sup> QAT Endpoint.

Example:

sudo ls /sys/kernel/debug/qat\_4xxx\_0000:76:00.0/telemetry/telemetry

**Note:** Update the *Domain:BDF* above as needed.

The telemetry feature is controlled with standard linux file commands into the control file as outlined below. The telemetry data is accessed through the **device\_data** or **rp\_<x>\_data** file depending on what data is required.

The telemetry data for device level and ring pair level is updated each second.

## 3.8.2 Telemetry Control

Device level telemetry is enabled by echoing 1 into the control file and disabled by echoing 0. Reading the control file will tell whether the feature is currently enabled or disabled.

Ring Pair level telemetry is enabled when device level telemetry is enabled. However the ring pairs need to be selected. Only 4 ring pairs can be shown at any given time. By echoing the number of the ring pair (0-63) into a **rp\_<x>\_data** file it can be selected. Where X is A,B,C or D.

### 3.8.2.1 Telemetry Commands

| Table 9: Te | lemetry Commands |
|-------------|------------------|

| Operation            | Command                                                          |
|----------------------|------------------------------------------------------------------|
| Enable Telemetry     | echo 1 > control                                                 |
| Disable Telemetry    | echo 0 > control                                                 |
| Query Telemetry data | cat device_data                                                  |
| Select Ring Pairs    | echo Num > rp_ <x>_data, Num is the ring pair to be selected</x> |
| Query Ring Pair data | cat rp_ <x>_data</x>                                             |

### **Selecting Ring Pairs**

#### Out-Of-Tree

This section provides guidance on the mapping of ring pairs to the VFs for the PF when using the Out-Of-Tree acceleration driver.

There are 4 Ring Pairs per VF. The Ring Pairs for a PF looks like the following:

| VF | Ring Pairs |    |    |    |
|----|------------|----|----|----|
| 1  | 0          | 1  | 2  | 3  |
| 2  | 4          | 5  | 6  | 7  |
| 3  | 8          | 9  | 10 | 11 |
| 4  | 12         | 13 | 14 | 15 |
| 5  | 16         | 17 | 18 | 19 |
| 6  | 20         | 21 | 22 | 23 |
| 7  | 24         | 25 | 26 | 27 |
| 8  | 28         | 29 | 30 | 31 |
| 9  | 32         | 33 | 34 | 35 |
| 10 | 36         | 37 | 38 | 39 |
| 11 | 40         | 41 | 42 | 43 |
| 12 | 44         | 45 | 46 | 47 |
| 13 | 48         | 49 | 50 | 51 |
| 14 | 52         | 53 | 54 | 55 |
| 15 | 56         | 57 | 58 | 59 |
| 16 | 60         | 61 | 62 | 63 |

Table 10: Ring Pairs

The **ServicesEnabled** defined for the PF control the mapping of the Ring Pairs:

- If only one workload is enabled (dc/sym/asym), the first two columns are used for this service.
- If dc and sym or asym is enabled, the first two columns are for sym or asym and the second two columns are for dc

• If sym and asym is enabled, the first and third columns are for asym and second and fourth columns are for sym.

## 3.8.2.2 Device Level Telemetry Values

| Value                   | Meaning                                                                   |
|-------------------------|---------------------------------------------------------------------------|
| sample_cnt              | Message count, counter.                                                   |
| pci_trans_cnt           | PCIe Partial Transactions, counter.                                       |
| max_rd_lat              | Max Read Latency, nanoseconds.                                            |
| rd_lat_acc_avg          | Average Read Latency, nanoseconds.                                        |
| max_lat                 | Max Get To Put latency, nanoseconds.                                      |
| lat_acc_avg             | Average Get To Put latency, nanoseconds.                                  |
| bw_in                   | PCIe write bandwidth, Mbps.                                               |
| bw_out                  | PCIe read bandwidth, Mbps.                                                |
| at_page_req_lat_acc_avg | Average Page Request Latency, nanoseconds.                                |
| at_trans_lat_acc_avg    | Average Translation Latency, nanoseconds.                                 |
| at_max_tlb_used         | Maximum uTLB Consumed, counter.                                           |
| util_cpr <x></x>        | Compression Slice Utilization On Slice X, percentage execution cycles.    |
| util_dcpr <x></x>       | Decompression Slice Utilization On Slice X, percentage execution cycles.  |
| util_xlt <x></x>        | Translator Slice Utilization On Slice X, percentage execution cycles.     |
| util_cph <x></x>        | Cipher Slice Utilization On Slice X, percentage execution cycles.         |
| util_ath <x></x>        | Authentication Slice Utilization On Slice X, percentage execution cycles. |
| util_ucs <x></x>        | UCS Slice Utilization On Slice X, percentage execution cycles.            |
| util_pke <x></x>        | PKE Slice Utilization On Slice X, percentage execution cycles.            |

#### Table 11: Device Level Telemetry Values

## 3.8.2.3 Ring Pair Level Telemetry Values

|                           | · · · · · · · ·  |
|---------------------------|------------------|
| Table 12: Ring Pair Level | Lelemetry Values |
|                           |                  |

|                         | ,                                              |
|-------------------------|------------------------------------------------|
| Value                   | Meaning                                        |
| sample_cnt              | Message count, counter.                        |
| rp_num                  | Number of the ring pair returning data.        |
| pci_trans_cnt           | PCIe Partial Transactions, counter.            |
| lat_acc_avg             | Average Get To Put latency, nanoseconds.       |
| bw_in                   | PCIe write bandwidth, Mbps.                    |
| bw_out                  | PCIe read bandwidth, Mbps.                     |
| at_glob_devtlb_hit      | Descriptor DevTLB hit rate per ring, counter.  |
| at_glob_devtlb_miss     | Descriptor DevTLB miss rate per ring, counter. |
| tl_at_payld_devtlb_hit  | Payload DevTLB hit rate per ring, counter.     |
| tl_at_payld_devtlb_miss | Payload DevTLB miss rate per ring, counter.    |

## 3.8.3 Monitoring Telemetry - Text Based

The following example Python scripts highlight how telemetry data can be monitored at the command line. The script first enables telemetry service for each QAT endpoint that supports telemetry and is in the **up** state. It then queries the telemetry data on a periodic basis collecting the data and formatting the display.

| Intel(R) QuickAssist Device Utilization |       |         |      |         |       |      |             |
|-----------------------------------------|-------|---------|------|---------|-------|------|-------------|
| Device                                  | %Comp | %Decomp | %PKE | %Cipher | %Auth | %UCS | Latency(ns) |
| qat_dev0                                | 100   | 45      | 0    | 0       | 0     | 0    | 174332      |
| qat_dev1                                | 100   | 45      | 0    | 0       | 0     | 0    | 174276      |
| qat_dev2                                | 100   | 45      | 0    | 0       | 0     | 0    | 174266      |
| qat_dev3                                | 100   | 45      | 0    | 0       | 0     | 0    | 174264      |
| qat_dev4                                | 100   | 42      | 0    | 0       | 0     | 0    | 182140      |
| qat_dev5                                | 100   | 43      | 0    | 0       | 0     | 0    | 181970      |
| qat_dev6                                | 100   | 42      | 0    | 0       | 0     | 0    | 181984      |
| qat_dev7                                | 100   | 43      | 0    | 0       | 0     | 0    | 181892      |
| ===========                             |       |         |      |         |       |      |             |

### 3.8.3.1 Out-Of-Tree

Device utilization script for Out-Of-Tree driver is included here.

Important: When running script as non-root User, ensure **adf\_ctl** is added to **qat** group.

sudo chgrp qat /usr/local/bin/adf\_ctl

Script can be downloaded from here

Running the script looks like:

sudo monitor-qat-oot-utilization

# intel.

## 3.8.3.2 In-Tree

Device utilization script for In-Tree driver can be downloaded from here

Running the script looks like:

sudo monitor-qat-utilization

# 3.9 Rate Limiting

**Note:** The instructions in this section apply to the Out-of-Tree QAT package. For details on Rate Limiting with In-tree solution, refer to sysfs-driver-qat\_rl documentation.

Rate Limiting is implemented by monitoring the utilization of the device on a per-VF, per-service basis and comparing that to the SLA allocated to that VF and service.

Resources are shared across guests and the resource utilization of each guest is measured relative to the capacity of the physical function.

The feature is supported for SYM, ASYM, and DC services.

To enable the Rate Limiting feature:

- 1. **Install** the driver package on the host with Single-Root Input/Output Virtualization (SR-IOV) enabled.
- 2. **Set** ServicesEnabled to asym or sym or dc.
- 3. **Perform** qat\_service shutdown and qat\_service start.

# 3.9.1 Service Level Agreement (SLA)

Service Level Agreement enforcement allocates a specified amount of capacity for a specified service to a specified VF: *max SLA enforced = (number of VFs) X (number of services)* where:

- Number of VFs varies based on device type
- Number of services = 2 (asymmetric or symmetric or compression)

## 3.9.2 SLA Units

SLA units are measured as follows:

- Symmetric Crypto 1Mbps of throughput.
- Asymmetric Crypto 1 unit is equal to 0.1 percent of available utilization.
- Compression 1Mbps of throughput.

#### Note:

- 1. In Gen4 devices, for Asymmetric Crypto services, it is more accurate to use metrics such as slice utilization and PCI bandwidth instead of Operations/second for SLA measurements.
- 2. Slice utilization and PCI bandwidth are more fair metrics as they show exactly how much HW resources each user is consuming, regardless of a particular algorithm processing speed.
- 3. Gen4 devices use a Hardware-assisted Rate limiting approach whereas legacy devices use a firmware-only Rate limiting approach.
- 4. For asymmetric service, SLAs shall be allocated at a granularity of 1 unit of device utilization percentage for RSA2K.
- 5. Below is a sample mapping table for the 5th Gen Intel<sup>®</sup> Xeon<sup>®</sup> Scalable Processer MCC SKU platform that translates the SLA units to equivalent ops/sec.
- 6. Users can run tests with the required algorithm to determine the mapping for other SKUs with different performance. Gen4 asymmetric performance numbers can differ based on the SKU.

Table 13: Sample Mapping Table for 5th Gen Intel® Xeon® Scalable Processer - MCC SKU

| Unit | RSA2K decrypt with CRT (Ops/sec) | RSA4K decrypt with CRT (Ops/sec) |
|------|----------------------------------|----------------------------------|
| 1    | 60                               | -                                |
| 5    | 300                              | -                                |
| 10   | 600                              | 60                               |
| 50   | 3000                             | 300                              |
| 100  | 6000                             | 600                              |
| 300  | 18000                            | 1800                             |
| 500  | 30000                            | 3000                             |
| 750  | 45000                            | 4500                             |
| 1000 | 60000                            | 6000                             |

# 3.9.3 SLA Manager Application

The **sla\_mgr** tool is used to create, update, delete, list and get SLA capabilities. The SLA Manager executable is available in **\$ICP\_ROOT/build/sla\_mgr** after the package is built and installed using ./ configure; make install commands.

## 3.9.3.1 SLA Commands

| Operation                      | Command                                                                                          |
|--------------------------------|--------------------------------------------------------------------------------------------------|
| Rate Limiting V1 (Legacy)      |                                                                                                  |
| Create SLA                     | ./sla_mgr create <vf_addr> <rate_in_sla_units> <service></service></rate_in_sla_units></vf_addr> |
| Update SLA                     | ./sla_mgr update <pf_addr> <sla_id> <rate_in_sla_units></rate_in_sla_units></sla_id></pf_addr>   |
| Rate Limiting V2               |                                                                                                  |
| Create SLA                     | ./sla_mgr create <vf_addr> <cir> <pir> <service></service></pir></cir></vf_addr>                 |
| Update SLA                     | ./sla_mgr update <pf_addr> <sla_id> <cir> <pir></pir></cir></sla_id></pf_addr>                   |
| For Legacy and Rate Limiting V | 2                                                                                                |
| Delete SLA                     | ./sla_mgr delete <pf_addr> <sla_id></sla_id></pf_addr>                                           |
| Delete all SLAs                | ./sla_mgr delete_all <pf_addr></pf_addr>                                                         |
| Query SLA capabilities         | ./sla_mgr caps <pf_addr></pf_addr>                                                               |
| Query list of SLAs             | ./sla_mgr list <pf_addr></pf_addr>                                                               |

#### Table 14: Rate Limiting SLA Commands

Options:

- **pf\_addr** Physical address in *domain:bus:device.function(xxxx:xx:xx)* format.
- vf\_addr Virtual address in *domain:bus:device.function(xxxx:xx:xx)* format.
- Service Asym(=0) or Sym(=1) or DC(=2).
- rate\_in\_sla\_units [0-MAX]. MAX is found by querying the capabilities.
- cir/pir committed/peak information rate [0-MAX]. MAX is found by querying the capabilities.
- **sla\_id** Value returned by **create** command.

In Legacy mode, to create/update SLA we use *rate\_in\_sla\_units*. With Rate Limiting V2, we use *cir/pir*. These units are equal to:

- 1 operation per second for asymmetric service (Legacy) or 0.1 percent of available utilization (Rate Limiting V2).
- 1 Megabits per second for symmetric service/compression service.

**Note:** To use Legacy Rate limiting *sla\_mgr* application, user needs to configure with option *–enable-legacy-sla-mgr*.

# 3.10 Power Management

The goal of power management is to manage and save power consumed by the device in the following states:

- idle => whenever no request is sent, power state is minimum.
- initialized or reset.
- active => whenever there are requests to be handled, power state is max.

# 3.10.1 Configuration

### 3.10.1.1 Out-of-Tree

When using the Out-of-Tree QAT package, Power management configuration is included in the device configuration file (i.e. /etc/4xxx\_devx.conf where x is the 0-based index of the device.)

Power management configurations parameters include:

| Parameter     | Description                                                       |
|---------------|-------------------------------------------------------------------|
| PmIdleSupport | Configure the device to enable/disable power management idle sup- |
|               | porting.                                                          |
|               | Power management idle support is enabled by default.              |

Table 15: Power Management Configuration

### 3.10.1.2 In-Tree

When using the in-tree solution, Power Management configuration is handled via sysfs-driver-qat. For details refer to sysfs-driver-qat documentation.

# 3.10.2 Usage

The information of power management status are exposed in debug sysfs file /sys/kernel/debug/ qat\_4xxx\_AAAA:BB:CC.D/pm\_status where:

• AAAA:BB:CC.D is the *Domain:BDF* of the target Intel<sup>®</sup> QAT Endpoint.

Example:

cat /sys/kernel/debug/qat\_4xxx\_0000:6b:00.0/pm\_status

The QAT device is statically configured, so any change in device configuration file will only be effective after the device is rebooted.

# 3.10.3 Considerations

While power management is an important feature in reducing power consumed, it can affect the internal components' clocks of QAT devices, and that can affect example telemetry work. It also can impact latency numbers.

**Important:** It is recommended to disable the power management feature if either of the following is true:

- Using feature dependent on clock speed, such as telemetry, or
- Supporting latency-sensitive workload.

# 3.11 Reliability, Availability, and Stability (RAS)

The RAS feature goal is to support the acceleration devices Reliability, Availability and Stability by handling the error interrupts initiated by the device.

Additionally the types of errors are counted and the counters made available via sysfs. For additional details refer to sysfs-driver-qat\_ras documentation.

# 3.11.1 RAS Usage

Following PCIe specifications, errors are categorized as follows:

| Error Type    | Description                                                                       |
|---------------|-----------------------------------------------------------------------------------|
| Correctable   | Device can recover on its own, no software involvement.                           |
|               | The <b>ras_correctable</b> counter is incremented in sysfs.                       |
| Uncorrectable | Software intervention is needed to resolve the error. This may require the appli- |
|               | cation to reset the session or resend the request to the device.                  |
|               | The <b>ras_uncorrectable</b> counter is incremented in sysfs.                     |
| Fatal         | Device unable to recover on its own even with software help. Restarting the de-   |
|               | vice is required.                                                                 |
|               | The <b>ras_fata1</b> counter is incremented in sysfs.                             |

Table 16: RAS Error Types

The RAS sysfs files are located at /sys/devices/pciAAAA:BB/AAAA:BB:CC.D/ras\_X where:

- AAAA:BB:CC.D is the *Domain:BDF* of the target Intel® QAT Endpoint.
- ras\_X is the error type (ras\_correctable/ras\_uncorrectable/ras\_fatal).

Example:

cat /sys/bus/pci/devices/0000\:6b\:00.0/ras\_fatal

**Note:** RAS is enabled by default when the device is initialised.

## 3.11.2 AER Errors

The Linux kernel implements an AER driver for each PCIe device to handle errors reported through the AER mechanism.

AER error counters for each device are exposed through sysfs files categorized as follows:

| Error Type             | Description                                                                                                                                                                                                                                                                                                                                                    |
|------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| AER Correctable        | Device can recover on its own, no software involvement.                                                                                                                                                                                                                                                                                                        |
|                        | The <b>aer_dev_correctable</b> counter is incremented in sysfs.                                                                                                                                                                                                                                                                                                |
| AER Uncor-<br>rectable | Software intervention is needed to resolve the error. In the case of an error caused<br>by a transaction failure or for instance a packet memory buffer that can't be re-<br>stored by ECC, then the device will need to reset in order to retry the transaction<br>and attempt recovery.<br>The <b>aer_dev_uncorrectable</b> counter is incremented in sysfs. |
| AER Fatal              | In the case of a fatal error, the AER driver will additionally reset the PCIe link in an                                                                                                                                                                                                                                                                       |
|                        | attempt to recover.                                                                                                                                                                                                                                                                                                                                            |
|                        | The <b>aer_dev_fata1</b> counter is incremented in sysfs.                                                                                                                                                                                                                                                                                                      |

#### Table 17: RAS AER Errors

AER errors counters are exposed at /sys/bus/pci/devices/AAAA:BB:CC.D/aer\_dev\_X where:

- AAAA:BB:CC.D is the *Domain:BDF* of the target Intel<sup>®</sup> QAT Endpoint.
- aer\_dev\_X is the error type (aer\_dev\_correctable/aer\_dev\_uncorrectable/aer\_dev\_fatal).

Example:

cat /sys/bus/pci/devices/0000\:6b\:00.0/aer\_dev\_correctable

**Important:** AER reporting must be enabled in the BIOS to have errors reported through AER.

# **4** Acceleration Driver

Intel<sup>®</sup> QAT can accelerate the following services:

- Symmetric cryptography
- Public key cryptography
- Data compression/decompression

The Intel<sup>®</sup> QAT Endpoints are exposed as PCI devices. Applications running in user space typically access these services via the Intel<sup>®</sup> QAT APIs. Applications that run in the Linux\* kernel can also access some services via the Linux\* Kernel Cryptographic Framework (LKCF) API.

# 4.1 Controlling the Driver

Two methods are provided to manage the acceleration driver. They include:

- qat\_service: script to manage the Intel<sup>®</sup> QAT Endpoints.
- adf-ct1: Utility for loading configuration files and sending events to the driver.

## 4.1.1 qat\_service

The **qat\_service** script is installed with the software package in the **/etc/init.d/** directory. The script allows a user to start, stop, or query the status (up or down) of a single Intel<sup>®</sup> QAT Endpoint or all Intel<sup>®</sup> QAT Endpoints in the system.

### 4.1.1.1 qat\_service Usage

```
To view all Intel<sup>®</sup> QAT Endpoints in the system, use:
```

```
service qat_service status
```

If for example, there are two Intel<sup>®</sup> QAT Endpoints in the system, the output will be similar to the following:

```
qat_dev0 - type: c6xx, inst_id: 0, bsf: 06:00:0, #accel: 5 #engines: 10 state: up
qat_dev1 - type: c6xx, inst_id: 1, bsf: 83:00:0, #accel: 5 #engines: 10 state: up
```

Other options are also available:

service qat\_service start||stop||status||restart||shutdown

For a system with multiple Intel<sup>®</sup> QAT Endpoints, you can start, stop or restart each individual device by passing the Intel<sup>®</sup> QAT Endpoint to be restarted or stopped as a parameter **qat\_dev<N>**, for example:

```
service qat_service stop qat_dev0
service qat_service stop qat_dev1
```

The shutdown qualifier enables the user to bring down all Intel<sup>®</sup> QAT Endpoints and unload driver modules from the kernel. This contrasts with the stop qualifier, which brings down one or more Intel<sup>®</sup> QAT Endpoints, but does not unload kernel modules, so other Intel<sup>®</sup> QAT Endpoints can still run.

## 4.1.2 adf\_ctl

The adf\_ctl user space utility is separate to the driver and provides a mechanism for:

- Loading configuration file data to the kernel driver. The kernel space driver uses the data and also provides the data to the user space driver.
- Sending events to the driver to bring devices up and down.

The **adf\_ct1** provided with the Intel<sup>®</sup> QAT 2.0 driver can also be used to interface with Intel<sup>®</sup> QAT 1.6 and 1.7 devices.

### 4.1.2.1 adf\_ctlUsage

To bring up, down, restart or reset device(s):

```
adf_ctl [-c|--config] [qat_dev] [up|down|restart|reset]
```

To print device(s) status:

```
adf_ctl [qat_dev] status
```

To use the specified configuration file:

-c (--config) [config/file/path]

Note: If no device (physical or virtual) is selected, this file is used against all existing devices.

### 4.1.2.2 Examples

To bring device 0 down:

adf\_ctl qat\_dev0 down

To load device configuration from default path (e.g. /etc/4xxx\_dev1.conf), then bring device l up:

adf\_ctl qat\_dev1 up

To load device configuration from specified path /etc/4xxx\_dev1.conf and bring device l up:

adf\_ctl -c /etc/user\_4xxx\_dev1.conf qat\_dev1 up

To restart all devices with default configuration files:

adf\_ctl restart

To restart all devices with specified configuration file /etc/user\_c4xxx\_dev1.conf:

adf\_ctl -c /etc/user\_4xxx\_dev1.conf restart

To restart device 0 with specified configuration file ~/user\_4xxx\_dev1.conf:

adf\_ctl -c ~/user\_c4xxx\_dev1.conf qat\_dev0 restart

To restart device 0:

adf\_ctl qat\_dev0 reset

# 4.2 Application Payload Memory Allocation

When performing offload operations through the Intel<sup>®</sup> QAT API, it is required that the payload data be placed in a buffer that is resident, physically contiguous, and DMA accessible from the acceleration hardware. It is the application's responsibility to provide buffers with these constraints.

Buffers are passed to the API with virtual addresses. The API translates these addresses to the address information required by the hardware.

## 4.2.1 Services

| Table 18: Acceleration | Driver Services |
|------------------------|-----------------|
|                        |                 |

| Service               | API                                     | Reference                              |
|-----------------------|-----------------------------------------|----------------------------------------|
| Cryptographic service | <pre>cpaCySetAddressTranslation()</pre> | See the Intel® QuickAssist Technology  |
|                       |                                         | Cryptographic API Reference Manual     |
|                       |                                         | (refer to Table 2) for details.        |
| Data Compression ser- | <pre>cpaDcSetAddressTranslation()</pre> | See the Intel® QuickAssist Technol-    |
| vice                  |                                         | ogy Data Compression API Reference     |
|                       |                                         | Manual (refer to Table 2) for details. |

When the software requires the physical address, it calls the registered function.

**Note:** This address translation function is called at least once per request. Consequently, for optimal performance, the implementation of this function should be optimized.

If using the Intel<sup>®</sup> QAT Data Plane API, buffers are passed to the Intel<sup>®</sup> QAT API as physical addresses. The library passes this directly to the hardware, without the need for translation.

# 4.2.2 Thread Specific USDM

By default, memory allocation uses the USDM slab allocator, which gives 2MB contiguous memory. The allocation has locks in the library to prevent a race condition in getting the memory from the slab.

This lock has an impact on some multi-threaded applications and use cases, like HAProxy, causing a drop in performance.

To mitigate this issue, thread specific USDM is implemented which allocates and handles memory specific to threads. (For multi-thread apps, allocated memory information will be maintained separately for each thread).

This feature can be enabled by configuring with the configure flag:

#### --enable-icp-thread-specific-usdm

In some use cases with thread specific USDM, using a 128K slab allocator instead of the default 2MB allocator could improve performance and reduce memory consumption for a large number of threads. This can be enabled by configuring with the configure flag

```
--enable-128k-slab
```

**Note:** There is a limitation with thread specific USDM: memory allocated in one thread should be freed only by the thread which allocates it.

Incorrect cleanup can lead to a segmentation fault (segfault).

Also, memory allocated in a thread is freed automatically when the thread exits/terminates, even if the user does not explicitly free the memory.

See the ./configure flags section of the Getting Started Guide for more information on these flags.

**Important:** We have observed poor multithreaded performance with QAT\_Engine using OpenSSL\* at higher thread counts.

Unfortunately, these issues appear to stem from the way OpenSSL\* implements its **en-gine\_table\_select** and locks. For relevant issues on the OpenSSL\* github pages, see the two issues below:

- OpenSSL\* 1.1.1.x: Performance bottleneck with locks in **engine\_table\_select()** function #18509, https://github.com/openssl/openssl/issues/18509
- OpenSSL\* 3.0: 3.0 performance degraded due to locking #20286, https://github.com/openssl/ openssl/issues/20286

# 4.3 Return Codes

This table shows the return codes used by various components of the acceleration driver, defined in **\$ICP\_ROOT/quickassist/include/cpa.h**.

| Return Type              | Return Code | Description                                        |
|--------------------------|-------------|----------------------------------------------------|
| CPA_STATUS_SUCCESS       | 0           | Requested operation was successful.                |
| CPA_STATUS_FAIL          | -]          | General or unspecified error occurred. Refer to    |
|                          |             | the console log user space application or to /var/ |
|                          |             | log/messages in kernel space for more details of   |
|                          |             | the failure.                                       |
| CPA_STATUS_RETRY         | -2          | Recoverable error occurred. Refer to relevant      |
|                          |             | sections of the API for specifics on what the sug- |
|                          |             | gested course of action.                           |
| CPA_STATUS_RESOURCE      | -3          | Required resource is unavailable. The resource     |
|                          |             | that has been requested is unavailable. Refer to   |
|                          |             | relevant sections of the API for specifics on what |
|                          |             | the suggested course of action.                    |
| CPA_STATUS_INVALID_PARAM | -4          | Invalid parameter has been passed in.              |
| CPA_STATUS_FATAL         | -5          | Fatal error has occurred. A serious error has oc-  |
|                          |             | curred. Recommended course of action is to shut    |
|                          |             | down and restart the component.                    |

#### Table 19: Return Codes

continues on next page

| Table 19 – continued from previous page |             |                                                                                                                                                                                           |  |
|-----------------------------------------|-------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|
| Return Type                             | Return Code | Description                                                                                                                                                                               |  |
| CPA_STATUS_UNSUPPORTED                  | -6          | The function is not supported, at least not with<br>the specific parameters supplied. This may be be-<br>cause a particular capability is not supported by<br>the current implementation. |  |
| CPA_STATUS_RESTARTING                   | -7          | The API implementation is restarting. This may<br>be reported if, for example, a hardware implemen-<br>tation is undergoing a reset.                                                      |  |

# 4.4 Linux\* Device Driver Operations Return Codes

This table shows the return codes used by the driver to handle Linux\* device driver operations.

| Return Type | Return Code | Description                                                  |  |
|-------------|-------------|--------------------------------------------------------------|--|
| SUCCESS     | 0           | Operation was successful.                                    |  |
| FAIL        | 1           | General error occurred. Refer to the console log user space  |  |
|             |             | application or to /var/log/messages in kernel space for      |  |
|             |             | more details of the failure.                                 |  |
| -EPERM      | -1          | Operation is not permitted. Used during ioctl operations.    |  |
| -ENOENT     | -2          | No such file or directory.                                   |  |
| -EINTR      | -4          | Interrupted system call.                                     |  |
| -EIO        | -5          | Input/Output error occurred. Used when copying configu-      |  |
|             |             | ration data to and from user space.                          |  |
| -EBADF      | -9          | Bad File Number. Used when an invalid file descriptor is de- |  |
|             |             | tected.                                                      |  |
| -EAGAIN     | -11         | Try Again. Used when a recoverable operation occurred.       |  |
| -ENOMEM     | -12         | Out of Memory. Memory resource that has been requested       |  |
|             |             | is not available.                                            |  |
| -EACCES     | -13         | Permission Denied. Used when the operation failed to con-    |  |
|             |             | nect to a process or open a device.                          |  |
| -EFAULT     | -14         | Bad Address. Used when an operation detects invalid pa-      |  |
|             |             | rameter data.                                                |  |
| -EBUSY      | -16         | Device or resource is busy.                                  |  |
| -EEXIST     | -17         | File exists.                                                 |  |
| -ENODEV     | -19         | No Such Device. Used when an operation detects invalid       |  |
|             |             | device id.                                                   |  |
| -EINVAL     | -22         | Invalid argument.                                            |  |
| -ENOTTY     | -25         | Invalid Command Type. Used when an ioctl operation de-       |  |
|             |             | tects an invalid command type.                               |  |
| -ENOSPC     | -28         | No space left on device.                                     |  |
| -ERANGE     | -34         | Math result not representable.                               |  |

### Table 20: Linux\* Device Driver Operations Return Codes

continues on next page

| Table 20 – continued from previous page |             |                                                |
|-----------------------------------------|-------------|------------------------------------------------|
| Return Type                             | Return Code | Description                                    |
| -ENOSYS                                 | -38         | Function not implemented.                      |
| -el3hlt                                 | -46         | Level 3 Halted.                                |
| -ETIME                                  | -62         | Timer expired.                                 |
| -EBADMSG                                | -74         | Not a data message.                            |
| -EOVERFLOW                              | -75         | Value too large for defined data type.         |
| -EOPNOTSUPP                             | -95         | Operation not supported on transport endpoint. |
| -EINPROGRESS                            | -115        | Operation now in progress.                     |

# **5** Configuration Files

This section describes the configuration file(s) that allows customization of runtime operation. The configuration file(s) must be tuned to meet the performance needs of the target application. There is a single configuration file for each Intel<sup>®</sup> QAT Endpoint in the system.

If Single-Root Input/Output Virtualization (SR-IOV) is enabled, a separate configuration file is used for each virtual function.

**Note:** The software package includes default configuration file(s), which may not provide optimal performance on all platforms. Consider performance implications as well as the configuration details provided in this section if your system requires modifications to the default configuration file.

# 5.1 Configuration File Overview

There is a single configuration file for each Intel<sup>®</sup> QAT Endpoint and there may be multiple Intel<sup>®</sup> QAT Endpoints.

Note: Depending on the model number, a device may also contain no Intel® QAT Endpoints.

The configuration file is split into a number of different sections: a General section and one or more Logical Instance sections.

The General section includes parameters that allow the user to specify:

- Which services are enabled.
- Concurrent requests default configuration.
- Interrupt coalescing configuration (optional).
- Statistics gathering configuration.

Additional details are included in General Section.

*Logical Instances* sections (there may be one or more) include parameters that allow the user to set:

• The number of cryptography or data compression instances being managed.

• For each instance, the name of the instance, whether or not polling is enabled, and the core to which an instance is affinitized.

Additional details are included in *Logical Instances Section*.

Sample configuration files are included in the package in the **quickassist/utilities/adf\_ctl/ conf\_files** directory.

# 5.2 General Section

The general section of the configuration file contains general parameters and statistics parameters.

**Note:** *Default* denotes the value in the configuration file when shipped or the value used if not specified in the configuration file.

This table describes the other parameters that can be included in the General section.

| Parameter                   | Description                                                                                                                                                                                  | Default           | Range                                                                                         |
|-----------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------|-----------------------------------------------------------------------------------------------|
| ServicesEnabled             | Defines the service(s) available (cryp-<br>tographic [cy], symmetric [sym],<br>asymmetric [asym], data compression<br>[dc]).<br>Refer to <i>ServicesEnabled</i> for addi-<br>tional details. | <varies></varies> | sym, asym,<br>cy, dc<br>Note: Mul-<br>tiple values<br>permitted,<br>use; as the<br>delimiter. |
| ServicesProfile             | Specifies the services that are available when the driver loads.                                                                                                                             | Default           | See <i>Service-sProfile</i> for additional details.                                           |
| CyNumConcurrentSymRequests  | Specifies the number of cryptographic<br>concurrent symmetric requests for<br>cryptographic instances in general.<br>Refer to <i>Concurrent Requests</i> for addi-<br>tional details.        | 512               | 64, 128, 256,<br>512, 1024,<br>2048, 4096,<br>8192, 16384,<br>32768, or<br>65536              |
| CyNumConcurrentAsymRequests | Specifies the number of cryptographic<br>concurrent asymmetric requests for<br>cryptographic instances in general.<br>Refer to <i>Concurrent Requests</i> for addi-<br>tional details.       | 64                | 64, 128, 256,<br>512, 1024,<br>2048, 4096,<br>8192, 16384,<br>32768, or<br>65536              |

|  | Table 21: | General | Section | Parameters |
|--|-----------|---------|---------|------------|
|--|-----------|---------|---------|------------|

continues on next page

| Table 21 – continued from previous page |                                                                                                                                                                                       |         |                                                                                  |  |
|-----------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------|----------------------------------------------------------------------------------|--|
| Parameter                               | Description                                                                                                                                                                           | Default | Range                                                                            |  |
| DcNumConcurrentRequests                 | Specifies the number of data compres-<br>sion concurrent requests for data com-<br>pression instances in general.<br>Refer to <i>Concurrent Requests</i> for addi-<br>tional details. | 512     | 64, 128, 256,<br>512, 1024,<br>2048, 4096,<br>8192, 16384,<br>32768, or<br>65536 |  |
| DcIntermediateBufferSizeInKB            | Specifies the size in KB of each inter-<br>mediate buffer in on-chip memory for<br>dynamic compression.                                                                               | 64      | 32 or 64                                                                         |  |
| ServiceChainingEnabled                  | Flag enables dc chaining.<br>NOTE: <b>ServicesEnabled</b> must be set<br>to <b>dc</b> .                                                                                               | 0       | 0 or 1                                                                           |  |
| HeartbeatTimer                          | Default heartbeat timer.                                                                                                                                                              | 1000    | >200                                                                             |  |
| AutoResetOnError                        | Automatically resets the device in case of fatal error or heartbeat failure.                                                                                                          | 0       | 0 or 1                                                                           |  |
| PmIdleSupport                           | This flag is to enable power manage-<br>ment idle support.<br>Refer to <i>Power Management Parame-</i><br><i>ters</i> for additional details.                                         | 1       | 0 or 1                                                                           |  |
| SVMEnabled                              | This flag is to enable SVM support.<br>Refer to <i>Shared Virtual Memory Pa-</i><br><i>rameters</i> for additional details.                                                           | 0       | 0 or 1                                                                           |  |
| ATEnabled                               | This flag is to enable Address Transla-<br>tion Service(ATS).<br>Refer to <i>Shared Virtual Memory Pa-</i><br><i>rameters</i> for additional details.                                 | 0       | 0 or 1                                                                           |  |

# 5.2.1 ServicesEnabled

Additional details on the **ServicesEnabled** parameter:

- This parameter is valid for all QAT devices.
- Default value varies depending on the underlying QAT Endpoint.
- cy is not valid value for QAT2.0 devices. asym and sym are used.
- Only two of the three services (asym, sym, and dc) may be enabled on an individual QAT2.0 Endpoints.

### 5.2.1.1 Performance Considerations

Important: The following is applicable to QAT2.0 devices only.

In order to maximize QAT throughput performance for a given service type, one should specify ONLY that service type parameter for **ServicesEnabled**.

By design, two Acceleration Engine clusters are available, each containing four Acceleration Engines. Each of these two clusters are limited to using a single service. Therefore, the possible split options are 4|4 or 8 for a given service type.

Configuration examples:

- While using VFs on a system configured for ServicesEnabled = sym;dc, 4 acceleration engines will be dedicated to SYM and 4 acceleration engines will be dedicated to DC, so only 2 resource providers per child VF can be used for SYM. Here, we should expect some performance degradation for SYM (even if DC is not used).
- While using VFs on a system configured for ServicesEnabled = sym, all 8 acceleration engines will be dedicated to SYM, so all 4 resource providers of a child VF can be used for SYM only. Here, we will see the best SYM performance.

**Note:** Packet size will also modulate the impact of the above configuration settings.

## 5.2.2 ServicesProfile

**Important:** This parameter is valid for QAT1.7x devices.

The **ServicesProfile** parameter defines the services that are available when the driver loads. For example, if **ServicesProfile** = **COMPRESSION** is in the General section, the compression and decompression are available, along with service chaining, but not cryptography.

**Note:** When a **ServicesProfiles** parameter value is used that supports rate limiting is defined, internal resources are reallocated to administrating Rate Limiting/Device Utilization. This reduces performance by roughly 5%.

### 5.2.2.1 General Default Configuration Parameters

| Service                   | DEFAULT | CRYPTO | COMPRESSION | CUSTOMI |
|---------------------------|---------|--------|-------------|---------|
| Asymmetric Crypto         | YES     | YES    |             | YES     |
| Symmetric Crypto          | YES     | YES    |             | YES     |
| Hash                      | YES     | YES    | YES         | YES     |
| Cipher                    | YES     | YES    |             | YES     |
| MGF KeyGen                | YES     | YES    |             |         |
| SSL/TLS KeyGen            | YES     | YES    |             | YES     |
| HKDF                      |         | YES    |             | YES     |
| Compression               | YES     |        | YES         | YES     |
| Decompression (stateless) | YES     |        | YES         | YES     |
| Decompression (stateful)  | YES     |        | YES         |         |
| Service Chaining          |         |        | YES         |         |
| Device Utilization        |         | YES    | YES         | YES     |
| Rate Limiting             |         | YES    | YES         | YES     |

Table 22: General Default Configuration Parameters

Note: Set the service profile to determine available features.

# 5.2.3 Concurrent Requests

Additional details on the concurrent request parameters:

- This parameter is valid for all QAT devices.
- The concurrent request parameters include both Transmit (Tx) and Receive (Rx) requests.
- For each service enabled, NumConcurrentRequests must be set to value from the range.
- The number of concurrent requests registered by the Intel<sup>®</sup> QuickAssist driver is set to NumConcurrentRequests 2.

This implementation guarantees that the request ring will never be full and avoids the need for a Memory Mapped IO (MMIO) read. This implementation maximizes throughput performance.

# 5.2.4 Power Management Parameters

**Important:** This parameter is valid for QAT2.0 devices.

Power management configuration is included in the device configuration file (i.e. /etc/4xxx\_devx.conf where x is the O-based index of the device.)

Power management configurations parameters include:

• **PmIdleSupport** - Configure the device to enable/disable power management idle supporting. Power management idle support is enabled by default.

Refer to the *Power Management* section for additional details.

# 5.2.5 Shared Virtual Memory (SVM) Parameters

**Important:** This parameter is valid for QAT2.0 devices.

SVM configuration parameters are included in the device configuration file (i.e.  $/etc/4xxx_devx.conf$  where x is the 0-based index of the device.)

### 5.2.5.1 SVMEnabled

When this flag is set in the driver configuration, it indicates that the guest virtual address (GVA) to host physical address (HPA) translation will use IOMMU hardware based translation table instead of using the software based address translation. With SVMEnabled set, it is not required to submit buffers that are physically contiguous.

#### Details

- This parameter is disabled by default. Refer to *SVM Kernel Requirements* section for additional details.
- The parameter is valid for both PF and VF configuration files.
- It **is** possible for the VF to enable this parameter even if the parameter is disabled in the corresponding PF configuration file.

### 5.2.5.2 ATEnabled

When this flag is set in the driver configuration, the Address Translation Service (ATS) is enabled. IOMMU and QAT have the ability to handle page faults using Page Request Service (PRS) when using dynamic virtual memory allocated by systemcall such as malloc.

#### Details

- This parameter is disabled by default. Refer to *SVM Kernel Requirements* section for additional details.
- SVMEnabled **must** be enabled in order to enable ATEnabled.
- The parameter is valid for both PF and VF configuration files.
- It is **not** possible for the VF to enable the service if the parameter is disabled in the corresponding PF configuration file.

# 5.3 Logical Instances Section

This section allows the configuration of logical instances in each address domain (kernel space and individual user space processes).

The address domains are in the following format:

- For the kernel address domain: [KERNEL] targeted to Linux\* Kernel Crypto Framework (LKCF).
- For user process address domains: [xxxxx], where xxxxx may be any ASCII value that uniquely identifies the user mode process.

In user space, to allow the driver to configure the logical instances associated with a user process correctly, the process must call the function **icp\_sal\_userStart** passing the **xxxxx** string during process initialization. When the user space process is finished, it must call the function **icp\_sal\_userStop** to free resources. Refer to *User Space Access Configuration Functions* for more information.

A single VF configured for the SR-IOV use case cannot have both user space instances and kernel space instances. Separate VFs must be created for user space and kernel space.

The **NumProcesses** parameter (in the User Process section) indicates the max number of user space processes within that section name with access to instances on this device. Refer to *icp\_sal\_userStart* for more information.

The items that can be configured for a logical instance are:

- The name of the logical instance.
- The polling mode.
- The core to which the instance is affinitized (optional).

# 5.3.1 [KERNEL] Section

In the [KERNEL] section of the configuration file, information about the number and type of kernel instances can be defined. This table describes the parameters that determine the number of kernel instances for each service.

**Note:** The maximum number of cryptographic and data compression instances supported per Intel<sup>®</sup> QAT Endpoint is 32. For exceptions refer to *Increasing the Maximum Number of Processes/Instances.* 

| Parameter         | Description                                         | Default | Range   |
|-------------------|-----------------------------------------------------|---------|---------|
| NumberCyInstances | Specifies the number of cryptographic instances.    | 0       | 0 to 32 |
|                   | Note: Depends on the number of allocations to other |         |         |
|                   | services.                                           |         |         |
| NumberDcInstances | Specifies the number of data compression instances. | 0       | 0 to 32 |
|                   | Note: Depends on the number of allocations to other |         |         |
|                   | services.                                           |         |         |

Table 23: [KERNEL] Section Parameters

# 5.3.2 User Process [xxxxx] Sections

There is one [xxxxx] section of the configuration file for each Intel<sup>®</sup> QAT Endpoint to be configured. In each [xxxxx] section of the configuration file, user space access to the Intel<sup>®</sup> QAT Endpoint can be configured. Parameters for each user process instance can also be defined. Common names for this section are [SSL] or [SHIM]

**Note:** Check the SKU information for your specific device to determine how many Intel<sup>®</sup> QAT Endpoints the device contains.

This table shows the parameters in the configuration file that can be set for user process [xxxxx] sections.

| Parameter         | Description                                                                                                                                                                                                                                                                                                                                                                                    | Default | Range                                                                                                                                                                                                                                                                                                                   |
|-------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| NumProcesses      | The number of user space pro-<br>cesses with section name [xxxxx]<br>that have access to this device.<br>The maximum number of<br>processes that can call<br>icp_sal_userStart and be<br>active at any one time. See<br>icp_sal_userStart for additional<br>information.<br>Caution: Resources are pre-<br>allocated. If this parameter value<br>is set too high, the driver fails to<br>load. | 1       | For constraints, see Max-<br>imum Number of Process<br>Calculations.<br>For exceptions, see Increas-<br>ing the Maximum Number of<br>Processes/Instances.                                                                                                                                                               |
| LimitDevAccess    | Indicates if the user space pro-<br>cesses in this section are limited<br>to only access instances on this<br>Intel® QAT Endpoint.                                                                                                                                                                                                                                                             | 0       | O (disabled, processes in<br>this section can access mul-<br>tiple Intel® QAT Endpoints),<br>or<br>1 (enabled, processes in this<br>section can only access this<br>Intel® QAT Endpoint).<br>For additional information,<br>see Configuring Multiple<br>Processes on a System<br>with Multiple Intel® QAT<br>Endpoints. |
| NumberCyInstances | Specifies the number of crypto-<br>graphic instances.<br>Note: Depends on the number of<br>allocations to other services.                                                                                                                                                                                                                                                                      | 6       | 0 to 32. For exceptions,<br>see Increasing the Max-<br>imum Number of Pro-<br>cesses/Instances.                                                                                                                                                                                                                         |
| NumberDcInstances | Specifies the number of data<br>compression instances.<br>Note: Depends on the number of<br>allocations to other services.                                                                                                                                                                                                                                                                     | 2       | 0 to 32. For exceptions,<br>see Increasing the Max-<br>imum Number of Pro-<br>cesses/Instances.                                                                                                                                                                                                                         |

#### Table 24: [User Process] Section Parameters

## 5.3.3 Cryptographic Logical Instance Parameters

The following table shows the parameters that can be set for cryptographic logical instances.

**Note:** *Default* denotes the value in the configuration file when shipped.

| Parameter       | Description                         | Default              | Range                   |  |  |
|-----------------|-------------------------------------|----------------------|-------------------------|--|--|
| CyXName         | Specifies the name of crypto-       | IPSec0 for KER-      | String (max. 64 char-   |  |  |
|                 | graphic instance number X.          | NEL section.         | acters)                 |  |  |
|                 |                                     | SSL0 for user        |                         |  |  |
|                 |                                     | section              |                         |  |  |
| CyXIsPolled     | Specifies if cryptographic in-      | 0 for kernel space   | 0 (interrupt mode) for  |  |  |
|                 | stance number x works in poll       | instances            | instances in the KERNEL |  |  |
|                 | mode, interrupt mode or epoll       | l for user space in- | section.                |  |  |
|                 | mode.                               | stance               | 1 (poll mode) for in-   |  |  |
|                 |                                     |                      | stances in user space   |  |  |
|                 |                                     |                      | sections.               |  |  |
|                 |                                     |                      | 2 (epoll mode event     |  |  |
|                 |                                     |                      | based polling mode)     |  |  |
|                 |                                     |                      | for instances in user   |  |  |
|                 |                                     |                      | space section.          |  |  |
| CyXCoreAffinity | Specifies the core to which the in- | Varies depending     | 0 to max. number of     |  |  |
|                 | stance should be affinitized.       | on the value of X.   | cores in the system.    |  |  |

Table 25: Cryptographic Logical Instance Parameters

## 5.3.4 Data Compression Logical Instance Parameters

This table shows the parameters in the configuration file that can be set for data compression logical instances.

| Parameter       | Description                                                                                                  | Default                                                            | Range                                                                                                                                                                                                                                       |
|-----------------|--------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| DCXName         | Specifies the name of data com-<br>pression instance number X.                                               | IPComp0                                                            | String (max. 64 char-<br>acters)                                                                                                                                                                                                            |
| DCXISPolled     | Specifies if data compression in-<br>stance number x works in poll<br>mode, interrupt mode or epoll<br>mode. | O for kernel space<br>instances<br>I for user space in-<br>stances | O (interrupt mode) for<br>instances in the KERNEL<br>section.<br>1 (poll mode) for<br>instances in the KER-<br>NEL_QAT and user<br>space sections.<br>2 (epoll mode event<br>based polling mode)<br>for instances in user<br>space section. |
| DcXCoreAffinity | Specifies the core to which<br>the data compression instance<br>should be affinitized.                       | Varies depending on the value of X.                                | 0 to max. number of cores in the system.                                                                                                                                                                                                    |

#### Table 26: Data Compression Logical Instance Parameters

#### Note:

- The maximum number of data compression instances supported is 64.
- *Default* denotes the value in the configuration file when shipped.

## 5.3.5 Setting the Core Affinity Parameter for a Logical Instance

When instances are configured with Ispolled = 1 (Polling mode), the parameter **CoreAffinity** does not have any impact.

Although not used, it is a valid parameter and applications can query the value using **cpaCyInstanceGet-Info2** (see **coreAffinity** bitmask in **CpaInstanceInfo2**). For example, the sample code affinitizes the thread that uses an instance to the core indicated in **CoreAffinity** the config file for that instance.

For instances configured in Interrupt Mode (IsPolled = 2 in user space (epoll) and IsPolled = 0 in kernel space), the value of CoreAffinity is used to affinitize the interrupt handler to that core.

# 5.4 Maximum Number of Process Calculations

The **NumProcesses** parameter is the number of user space processes per service within the [xxxx] section domain with access to this Intel<sup>®</sup> QAT Endpoint.

The value to which this parameter can be set is determined by a number of factors including the number of cryptography instances and/or data compression instances in the process section along with **Service-sEnabled** and potentially **ServicesProfile**. The total number of processes, per service, created by the driver is given by the expression (e.g., for cryptography):

(NumProcesses) x (NumberCyInstances)

The maximum number of processes that can be supported is dependant upon the underlying hardware.

## 5.4.1 Increasing the Maximum Number of Processes/Instances

#### Note:

- One bank is used per Intel<sup>®</sup> QAT virtual function (VF).
- This section only applies when the instances that make use of polled mode.

The maximum number of instances can be increased with the careful selection of the **ServiceEnabled** parameter.

Compression, symmetric cryptography, and asymmetric cryptography each require two rings out of the 16 possible rings for a ring bank. By selecting only, the services needed, the number of instances can be increased.

Here are the variations including the maximum number of processes that can be supported for given configuration:

**Note:** The **ServicesProfile** parameter value may also need to be changed. See *Services Profile* for additional information.

| ServicesEnabled         | QAT 1.7x | QAT 2.0 | Notes                                                    |
|-------------------------|----------|---------|----------------------------------------------------------|
| sym                     | 128      | 64      | Compression and asymmetric crypto service not available. |
| asym                    | 128      | 64      | Compression and symmetric crypto service not available.  |
| су                      | 128      | Invalid | Compression and symmetric crypto service not available.  |
| dc                      | 128      | 64      | Asymmetric and symmetric crypto service not available.   |
| dc;sym                  | 64       | 32      | Asymmetric crypto services will not be available.        |
| dc;asym                 | 64       | 32      | Symmetric crypto services will not be available.         |
| sym;asym Invalid 32 Com |          | 32      | Compression services will not be available.              |

### 5.4.1.1 Invalid Configurations

If maximum number of processes is exceeded, the acceleration software will fail to load. The error message will be similar to:

```
service qat_service restart qat_dev0
```

Stopping device qat\_dev0
Starting device qat\_dev0
Processing /etc/4xxx\_dev0.conf
Ioctl failed
QAT Error: Failed to load config data to device

And dmesg output will look similar to:

```
[116378.383041] Don't have enough rings for instance SSL0 in process SHIM_DEV0_INT_32
[116378.391976] 4xxx 0000:6b:00.0: Failed to create rings for cy
[116378.398881] 4xxx 0000:6b:00.0: Failed to process user section SHIM
[116378.406484] 4xxx 0000:6b:00.0: Failed to config device
```

### 5.4.1.2 Configuring Instances for Virtual Functions

To configure the number of instances for a virtual function:

- 1. Install the driver package on the host with SR-IOV enabled.
- 2. **Update** the physical function configuration file to set **ServicesEnabled** (refer to *Increasing the Maximum Number of Processes/Instances*).
- 3. **Perform** qat\_service shutdown and qat\_service start.
- 4. **Update** the virtual function configuration file to set **ServicesEnabled** (refer to *Increasing the Max-imum Number of Processes/Instances*).
- 5. **Restart** the **qat\_service**.

The value of **ServicesEnabled** in the VF configuration file should be the same as the value of **Service-sEnabled** in the PF configuration file, or a subset of that value as shown in this table. For instance, if a PF is configured as **cy**, allowable VF configurations related to that PF can only be **cy**, **asym**, or **sym**. VF device restart will fail if a VF configuration is not allowed for that related PF.

If a VF service is configured to a subset of PF service, the number of VF instances is limited to the number allowed for that PF service as described in *Increasing the Maximum Number of Processes/Instances*. For example, if the PF configuration file has **ServicesEnabled=dc; asym**, only four (not eight) dc instances are enabled if the VF is configured for dc only.

**Note:** Valid Physical Function for each supported platform is described in the *Configuration Variations* table.

| Configured PF Service | Available VF Services |  |  |
|-----------------------|-----------------------|--|--|
| dc;asym               | dc;asym               |  |  |
|                       | asym                  |  |  |
|                       | dc                    |  |  |
| dc;sym                | dc;sym                |  |  |
|                       | sym                   |  |  |
|                       | dc                    |  |  |
| asym;sym              | asym;sym              |  |  |
|                       | sym                   |  |  |
|                       | asym                  |  |  |
| asym                  | asym                  |  |  |
| sym                   | sym                   |  |  |
| dc                    | dc                    |  |  |

Table 28: Configuring Physical Functions and Virtual Functions

# 5.5 Configuring Multiple Intel<sup>®</sup> QuickAssist Technology Endpoints in a System

A platform may include more than one Intel<sup>®</sup> QAT Endpoint. Each device must have its own configuration file. When the acceleration software is installed, default configuration files are installed to the **/etc** folder. The format and structure of the configuration file is exactly the same for all devices.

**Warning:** If a configuration file does not exist for an Intel<sup>®</sup> QAT Endpoint, that endpoint will not start, and an error is displayed indicating that a configuration file was not found.

To determine the number of Intel<sup>®</sup> QAT Endpoints in a system, use the **1spci** utility:

```
lspci -nn | egrep -e

→ '8086:37c8|8086:19e2|8086:0435|8086:6f54|8086:4940|8086:4942|8086:4944|8086:4946'
```

The output from a high-end 4th Gen Intel<sup>®</sup> Xeon<sup>®</sup> Scalable Processor is similar to the following:

|         | Co-processor                 |         |       |             |        |             |      |     |
|---------|------------------------------|---------|-------|-------------|--------|-------------|------|-----|
|         | Co-processor<br>Co-processor |         |       | •           |        |             |      |     |
| 7a:00.0 | Co-processor                 | [0b40]: | Intel | Corporation | Device | [8086:4940] | (rev | 40) |
|         | Co-processor                 |         |       | •           |        |             |      |     |
|         | Co-processor<br>Co-processor |         |       | •           |        |             |      |     |
|         | Co-processor                 |         |       | •           |        |             |      |     |

The output from a system with a high-end Intel<sup>®</sup> C62x Chipset SKU is similar to the following:

88:00.0 Co-processor [0b40]: Intel Corporation Device [8086:37c8] (rev 03) 8a:00.0 Co-processor [0b40]: Intel Corporation Device [8086:37c8] (rev 03) 8c:00.0 Co-processor [0b40]: Intel Corporation Device [8086:37c8] (rev 03)

Then, after the driver is loaded, the user can use the **qat\_service** script to determine the name of each Intel<sup>®</sup> QAT Endpoint and its status. For example:

service qat\_service status

The output from a high-end 4th Gen Intel<sup>®</sup> Xeon<sup>®</sup> Scalable Processor is similar to the following:

(continued from previous page)

| qat_dev2 - type: 4xxx, | inst_id: 2, | node_id: 0, | bsf: 0000:75:00.0, | #accel: 1 |
|------------------------|-------------|-------------|--------------------|-----------|
| ⊶#engines: 9 state: up |             |             |                    |           |
| qat_dev3 - type: 4xxx, | inst_id: 3, | node_id: 0, | bsf: 0000:7a:00.0, | #accel: 1 |
| ⊶#engines: 9 state: up |             |             |                    |           |
| qat_dev4 - type: 4xxx, | inst_id: 4, | node_id: 1, | bsf: 0000:e8:00.0, | #accel: 1 |
| →#engines: 9 state: up |             |             |                    |           |
| qat_dev5 - type: 4xxx, | inst_id: 5, | node_id: 1, | bsf: 0000:ed:00.0, | #accel: 1 |
| →#engines: 9 state: up |             |             |                    |           |
| qat_dev6 - type: 4xxx, | inst_id: 6, | node_id: 1, | bsf: 0000:f2:00.0, | #accel: 1 |
| →#engines: 9 state: up |             |             |                    |           |
| qat_dev7 - type: 4xxx, | inst_id: 7, | node_id: 1, | bsf: 0000:f7:00.0, | #accel: 1 |
| →#engines: 9 state: up | ,           | ,           | ,                  |           |
|                        |             |             |                    |           |

The output from a system with a high-end Intel<sup>®</sup> C62x Chipset SKU is similar to the following:

qat\_dev0 - type: c6xx, inst\_id: 0, bsf: 06:00:0, #accel: 5 #engines: 10 state: up qat\_dev1 - type: c6xx, inst\_id: 1, bsf: 85:00:0, #accel: 5 #engines: 10 state: up qat\_dev2 - type: c6xx, inst\_id: 2, bsf: 87:00:0, #accel: 5 #engines: 10 state: up

The **qat\_service** can start, stop, restart and shutdown each device separately or all Intel<sup>®</sup> QAT Endpoints together. Refer to *Managing Intel QuickAssist Technology Endpoints Using qat\_service* for more information.

Some important configuration file information when using multiple Intel<sup>®</sup> QAT Endpoints:

- When specifying kernel and user space instances in the configuration file, the Cy<Number>Name and Dc<Number>Name parameters must be unique in the context of the section name only.
  - For example, it is valid to have a parameter called Cy0Name in both a kernel instance section (if supported) and a user instance section in the same configuration file without issue. Also, the parameter names do not need to be unique at a system-wide level. For example, it is valid to have a parameter called Cy0Name in both the configuration file for dev0 and the configuration file for dev1 without issue.
- For Intel<sup>®</sup> QAT Endpoints with configuration files that have the same section name (for example, [SSL] and the same data in that section), it is necessary to use the cpaCyInstanceGetInfo2() function to distinguish between Intel<sup>®</sup> QAT Endpoints. The cpaCyInstanceGetInfo2() allows the user of the API to query which Intel<sup>®</sup> QAT Endpoint a cryptography instance handle belongs to. In addition, for any application domain defined in the configuration files (e.g. [SSL]), a call to cpaCyGet-NumInstances() returns the number of cryptography instances defined for that domain across all configuration files. A subsequent call to cpaCyGetInstances() obtains these instance handles.

# 5.6 Configuring Multiple Processes on a System with Multiple Intel<sup>®</sup> QAT Endpoints

As an example, consider a system with two Intel<sup>®</sup> QAT Endpoints where it is necessary to configure two user space sections. One section is identified as SSL and the other is identified as Internet Protocol Security (IPSec).

- For the SSL section, configure eight processes, where each process has access to one acceleration instance.
- For the IPSec section, configure one process, with access to eight acceleration instances, four per Intel<sup>®</sup> QAT Endpoint.

In this scenario, the user space section of the configuration file would look like the following for the first Intel® QAT Endpoint.

```
[SSL] #User space section name
NumProcesses=4 # There are 4 user space process with section name SSL with access
\rightarrowto this device
LimitDevAccess=1 # These 4 SSL user space processes only use this device
NumCyInstances=1 # Each process has access to 1 Cy instance on this device
NumDcInstances=0 # Each process has access to 0 Dc instances on this device
# Crypto - User instance #0
CyOName = "SSLO"
CyOIsPolled = 1
CyOCoreAffinity = 0 \# Core affinity not used for polled instance
[IPsec] #User space section name
NumProcesses=1 # There is 1 user space process with section name IPSec with access
\rightarrowto this device
LimitDevAccess=0 # This IPSec user space process may have access to other devices
NumCyInstances=4 # The IPSec process has access to 4 Cy instances on this device
NumDcInstances=0 # The IPSec process has access to 0 Dc instances on this device
# Crypto - User instance #0
CyOName = "IPSec0"
CyOIsPolled = 1
CyOCoreAffinity = 0 # Core affinity not used for polled instance
# Crypto - User instance #1
Cy1Name = "IPSec1"
CyllsPolled = 1
CylCoreAffinity = 0 # Core affinity not used for polled instance
# Crypto - User instance #2
Cy2Name = "IPSec2"
Cy2IsPolled = 1
Cy2CoreAffinity = 0 # Core affinity not used for polled instance
```

(continues on next page)

(continued from previous page)

```
# Crypto - User instance #3
Cy3Name = "IPSec3"
Cy3IsPolled = 1
Cy3CoreAffinity = 0 # Core affinity not used for polled instance
```

The second Intel<sup>®</sup> QAT Endpoint configuration looks like:

```
[SSL] #User space section name
NumProcesses=4 # There are 4 user space process with section name SSL with access
\rightarrowto this device
LimitDevAccess=1 # These 4 SSL user space processes only use this device
NumCyInstances=1 # Each process has access to 1 Cy instance on this device
NumDcInstances=0 # Each process has access to 0 Dc instances on this device
# Crypto - User instance #0
CyOName = "SSLO"
CyOIsPolled = 1
CyOCoreAffinity = 0 # Core affinity not used for polled instance
[IPsec] #User space section name
NumProcesses=1 # There is 1 user space process with section name IPSec with access
\rightarrowto this device
LimitDevAccess=0 # This IPSec user space process may have access to other devices
NumCyInstances=4 # The IPSec process has access to 4 Cy instances on this device
NumDcInstances=0 # The IPSec process has access to 0 Dc instances on this device
# Crypto - User instance #0
CvOName = "IPSec0"
CyOIsPolled = 1
CyOCoreAffinity = 0 # Core affinity not used for polled instance
# Crypto - User instance #1
Cy1Name = "IPSec1"
CyllsPolled = 1
CylCoreAffinity = 0 # Core affinity not used for polled instance
# Crypto - User instance #2
Cy2Name = "IPSec2"
Cv2IsPolled = 1
Cy2CoreAffinity = 0 # Core affinity not used for polled instance
# Crypto - User instance #3
Cy3Name = "IPSec3"
Cy3IsPolled = 1
Cy3CoreAffinity = 0 # Core affinity not used for polled instance
```

Eight processes (with section name SSL) can call the **icp\_sal\_userStart("SSL")** function to get access to one crypto instance each. One process (with section name IPSec) can call the **icp\_sal\_userStart("IPSec")** function to get access to eight crypto instances.

Internally in the driver, this works as follows:



- 1. When the driver is configured (that is, the **service** qat\_service is called), the driver reads the configuration file for the device and populates an internal configuration table.
- 2. Reading the configuration file for dev0:
  - a. For the section named [SSL], the driver determines that four processes are required and that these processes limit access to this device only. In this case, the driver creates four internal sections that it labels SSL\_DEV0\_INT\_0, SSL\_DEV0\_INT\_1, SSL\_DEV0\_INT\_2 and SSL\_DEV0\_INT\_3. Each section is given access to one crypto instance as described.
  - b. For section name [IPSec], the driver determines that one process is required and that this process does not limit access to this device only (that is, it may access instances on other devices). In this case, the driver creates one internal section that it labels IPSec\_INT\_0 and gives this access to four crypto instances on this device.
- 3. Reading the configuration file for dev1:
  - a. For the section named [SSL], the driver determines that four processes are required and that these processes are limited to access this device only. In this case, the driver creates four internal sections that it labels SSL\_DEV1\_INT\_0, SSL\_DEV1\_INT\_1, SSL\_DEV1\_INT\_2 and SSL\_DEV1\_INT\_3. Each section is given access to one crypto instance as described.
  - b. For the section named [IPSec], the driver determines that one process is required and that this process may have access to instances on other devices. In this case, the driver creates one internal section that it labels IPSec\_INT\_0 and gives this access to four crypto instances on this device.

**Note:** This section name now appears in both devices' internal configuration and, therefore, the process that gets assigned this section name will have access to instances on both devices.

4. In total, there are nine separate sections (SSL\_DEV0\_INT\_0, SL\_DEV0\_INT\_1, SSL\_DEV0\_INT\_2, SSL\_DEV0\_INT\_3, SSL\_DEV1\_INT\_0, SSL\_DEV1\_INT\_1, SSL\_DEV1\_INT\_2, SSL\_DEV1\_INT\_3 and IPSec\_INT\_0) with access to crypto instances.

When a process calls the icp\_sal\_userStart ("SSL") function, the driver locates the next available section of the form SSL\_DEV<m>\_INT<...> (of which there are eight in total in this example) and assigns this section to the process. This gives the process access to corresponding crypto instances.

When a process calls the icp\_sal\_userStart ("IPSec") function, the driver locates the next available section of the form IPSec\_INT\_<...> (of which there is only one in total for this example) and assigns this section to the process. This gives the process access to the corresponding crypto instances.

**Note:** The icp\_sal\_userStartMultiProcess() function has been deprecated. The API still exists, but it simply calls icp\_sal\_userStart().

# 5.7 Sample Configuration Files

Sample configuration files are available in **quickassist/utilities/adf\_ctl/conf\_files**. Depending on the product and configuration, one or more of these will be copied to **/etc** during the package installation.

# **6** Services

Intel® QAT can accelerate the following services:

- Data compression/decompression.
- Symmetric cryptography.
- Public key cryptography.

Details on the services are included in the following sections.

# 6.1 Data Compression

## 6.1.1 Compression Features

- Deflate Compression Algorithm.
- LZ77 and Huffman Encoding.
- LZ4/LZ4s Compression Algorithm.
- Compress and Verify.
- Checksums.
- Programmable CRC64.
- accumulateXXHash.

## 6.1.2 Compression Limitations

- Stateful compression is not supported.
- Stateful decompression is not supported on QAT2.0 hardware devices.
- Batch and Pack (BnP) compression is not supported.
- accumulateXXHash is not supported when combined with autoSelectBestHuffmanTree
- accumulateXXHash is not supported in Decompression or Combined sessions

# 6.1.3 Compression Session Setup

The following table lists the properties that should be configured in the CpaDcSessionSetupData structure depending on the compression algorithm requested.

| Table 29: Compression CpaE<br>Properties      | DcSessionSetupData                  |
|-----------------------------------------------|-------------------------------------|
| Property                                      | Details                             |
| CpaDcCompLvl compLevel                        | Properties common to all algorithms |
| CpaDcCompType compType                        |                                     |
| CpaDcAutoSelectBest autoSelectBestHuffmanTree |                                     |
| CpaDcSessionDir sessDirection                 |                                     |
| CpaDcSessionState sessState                   |                                     |
| CpaDcCompWindowSize windowSize                |                                     |
| CpaDcChecksum checksum                        |                                     |
| CpaDcHuffType huffType                        | Deflate specific                    |
| CpaDcCompLZ4BlockMaxSizelz4BlockMaxSize       | LZ4 specific                        |
| CpaBoolean lz4BlockChecksum                   |                                     |
| CpaBoolean lz4BlockIndependence               | ]                                   |
| CpaBoolean accumulateXXHash                   |                                     |
| CpaDcCompMinMatch minMatch                    | LZ4s specific                       |

**Note:** Should the application use no-session API cpaDcNsCompressData(), the properties listed above are available in the CpaDcNsSetupData data structure.

# 6.1.4 Decompression Session Setup

The following table lists the properties that should be configured in the CpaDcSessionSetupData structure depending on the format of the payload to decompression.

| reperties                      |         |     |
|--------------------------------|---------|-----|
| Property                       | Deflate | LZ4 |
| CpaDcCompType compType         | Yes     | Yes |
| CpaDcSessionDir sessDirection  | Yes     | Yes |
| CpaDcSessionState sessState    | Yes     | Yes |
| CpaDcCompWindowSize windowSize | Yes     | Yes |
| CpaDcChecksum checksum         | Yes     | Yes |

| Table 30:  | Decompression | CpaDcSessionSetupData |
|------------|---------------|-----------------------|
| Properties |               |                       |

### 6.1.4.1 Deflate Decompression

With deflate based format such as Gzip the application is required to skip the GZip header and present to Intel<sup>®</sup> QAT the first byte of the deflate block.



When the *Deflate Block #3* is processed, the property **endOfLastBlock** in the **CpaDcRqResults** structure will be set to **CPA\_TRUE**. This notifies the application that no more data can be decompressed. At this point, the session must be either removed and re-initialized, or reset. With an application handling multi-gzip data, the both the Gzip footer and Gzip header must be skipped.

## 6.1.4.2 LZ4 Decompression

When decompressing LZ4 frames, the application is required to parse the frame header to extract **B**. **Checksum** flag. This flag is used to set the configuration parameter **lz4BlockChecksum** in **CpaDcSession-SetupData**. When decompressing LZ4 frames, the application should not include the frame header nor the frame footer in the source buffer to be processed by Intel<sup>®</sup> QAT hardware.



**Note:** In decompression direction, the property **endOfLastBlock** removes the need for the application to know where the last block ends. The QAT hardware will stop after processing the last block. This applies to both GZIP and LZ4 formats.

### 6.1.4.3 LZ4 Decompression Limitations

When decompressing LZ4 data compressed without Intel<sup>®</sup> QAT hardware, it is important to ensure that the compressor limits the history buffer to 32KB. Data compressed using a history buffer larger than 32KB will result with a CPA\_DC\_INVALID\_DIST (-10) error code.

### 6.1.4.4 Multi-frame decompression support

Intel<sup>®</sup> QAT hardware can decompress a payload that includes multiple frames. This applies to both Gzip and LZ4 formats. The figure below shows how the application must behave to decompress LZ4 multi frame payloads.



# 6.1.5 Performance Considerations

To enable the application benefiting from the QAT2.0 HW maximal performance, it is recommended to populate all the DIMMs around the CPU sockets in use.

# 6.1.6 Flush Flags

The table below shows the flush flags that should be used depending on the application's use case. The application programming model should follow this table.

| Algorithm | Use Case                                                      | Intermediate Request | Last Request                     |
|-----------|---------------------------------------------------------------|----------------------|----------------------------------|
| Deflate   | Stateless compression                                         | CPA_DC_FLUSH_FULL    | CPA_DC_FLUSH_FINAL to set BFINAL |
|           | Stateless decompression                                       | CPA_DC_FLUSH_FINAL   |                                  |
|           | Stateful decompression<br>(QAT1.X devices only)               | CPA_DC_FLUSH_SYNC    | CPA_DC_FLUSH_FINAL               |
| LZ4       | Stateless compression<br>with accumulateXxHash =<br>CPA_FALSE | CPA_DC_FLUSH_FINAL   | CPA_DC_FLUSH_FINAL               |

Table 31: Flush Flags

| Table 31 – continued from previous page |                         |                      |                    |  |
|-----------------------------------------|-------------------------|----------------------|--------------------|--|
| Algorithm                               | Use Case                | Intermediate Request | Last Request       |  |
|                                         | Stateless compression   | CPA_DC_FLUSH_FULL    | CPA_DC_FLUSH_FINAL |  |
|                                         | with accumulateXxHash = |                      |                    |  |
|                                         | CPA_TRUE                |                      |                    |  |
|                                         | Stateless decompression | CPA_DC_FLUSH_FINAL   | CPA_DC_FLUSH_FINAL |  |
| LZ4s                                    | Stateless compression   | CPA_DC_FLUSH_FINAL   | CPA_DC_FLUSH_FINAL |  |

**Note:** QAT1.X hardware devices still support stateful decompression operations. QAT2.0 hardware device only supports stateless operations.

# 6.1.7 Checksums

With the addition of LZ4 algorithm support on QAT2.0, the compression hardware accelerators are now capable to generate XXHash32 checksums. QAT2.0 device is now supporting the following checksums:

Table 32: Checksums

| Checksum Type | Usage                     |
|---------------|---------------------------|
| CRC32         | Required for GZip support |
| Adler32       | Required for Zlib support |
| XXHash32      | Required for LZ4 support  |

**Note:** In the event the *XXHash32* checksum should be reset, it must be done calling the API cpaDcRe-setXXHashState().

# 6.1.8 LZ4s Compressed Data Block format

LZ4s is a variant of LZ4 block format. LZ4s should be considered as an intermediate compressed block format. The LZ4s format is selected when the application sets the **compType** to **CPA\_DC\_LZ4S** in **CpaD-cSessionSetupData**. The LZ4s block returned by the Intel<sup>®</sup> QAT hardware can be used by an external software post-processing to generate other compressed data formats.

The following table lists the differences between LZ4 and LZ4s block format. LZ4s block format uses the same high-level formatting as LZ4 block format with the following encoding changes:

#### **Programmer's Guide**

| Feature            | LZ4                                                                                             | LZ4s                                                                                                                                                                                                                                                                                                                     |
|--------------------|-------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Sequence<br>Header | 4-bit copy length 4-bit literal length                                                          | Same as LZ4                                                                                                                                                                                                                                                                                                              |
| Copy Length        | Length 4-19 bytes (encoding<br>values 0-15),<br>19 bytes adds an extra byte<br>with value 0x00. | Copy length value of 0 means no copy with this<br>sequence. For Min Match of 3 bytes, Copy length<br>value 1-15 means length 3-17 with 17 bytes adding<br>an extra byte with value 0x00.<br>For Min Match of 4 bytes, Copy length value 1-15<br>means length 4-18 with 18 bytes adding an extra<br>byte with value 0x00. |

#### Table 33: Differences between LZ4 and LZ4s block format

#### Note:

- LZ4 requires a copy/token in every sequence, except the last sequence.
- The last sequence in block does not contain a copy in both LZ4 and LZ4s.

LZ4s encoding creates a single block of compressed data per request. This is different from LZ4 which creates multiple blocks defined by the LZ4 max block size setting. An LZ4s block is only made of LZ4s sequences.

A sequence in LZ4s can contain:

- Only a token.
- Only literals.
- A token and literals.

Any of the sequence types can exist anywhere in the LZ4s block. The last LZ4s sequence in the LZ4s block shall satisfy the end-of-block restrictions outlined in the LZ4 specification.

### 6.1.8.1 LZ4 Compression Support

With the QAT 2.0 API, the application can create LZ4 frames. This is achieved using APIs **cpaDcGener-ateHeader()** and **cpaDcGenerateFooter()**. These APIs are also able to generate GZip and Zlib formats. More information is available in the API reference manual.

The cpaDcGenerateHeader () creates a 7 byte LZ4 frame header which includes:

- Magic number 0x184D2204.
- The LZ4 max block size defined in the CpaDcSessionSetupData.
- Flag byte as:
  - Version = 1
  - Block independence = 0
  - Block checksum = 0

- Content size present = 0
- Content checksum present = 1
- Dictionary ID present = 0
- Content size = 0
- Dictionary ID = 0
- Header checksum = 1 byte representing the second byte of the XXH32 of the frame descriptor field.

The **cpaDcGenerateFooter()** API must be used after processing all the requests. This API must be called last to append the frame footer. The **cpaDcGenerateFooter()** API creates an 8 byte frame footer adding both the end marker (4 bytes set to 0x00) and the XXHash32 checksum computed by Intel<sup>®</sup> QAT hardware.

# 6.1.9 Compress-and-Verify

The Compress and Verify (CnV) feature checks and ensures data integrity in the compression operation of the Data Compression API. This feature introduces an independent capability to verify the compression transformation.

Refer to Intel<sup>®</sup> QuickAssist Technology Data Compression API Reference Manual.

#### Note:

- CnV is always enabled via the compression APIs (cpaDcCompressData(), cpaDcCompressData2(), cpaDcNsCompressData(), cpaDcDpEnqueueOp()).
- CnV supports compression operations only.
- The compressAndVerify flag in the CpaDcDpOpData structure should be set to CPA\_TRUE when using the cpaDcDpEnqueueOp() or cpaDcDpEnqueueOpBatch() API.

### 6.1.9.1 Compress and Verify Error log in Sysfs

The implementation of the Compress and Verify solution keeps a record of the CnV errors that have occurred since the driver was loaded. The error count is provided on a per Acceleration Engine basis.

The path to the CnV error log is: /sys/kernel/debug/qat\_<qat\_device>\_<Bus>\:<device>.<function>/cnv\_errors. For additional details refer to debugfs-driver-qat documentation.

Each Acceleration Engine keeps a count of the CnV errors. The CnV error counter is reset when the driver is loaded. The tool also reports the last error type that caused a CnV error.

### 6.1.9.2 Compress and Verify and Recover (CnVnR)

The Compress and Verify and Recover (CnVnR) feature allows a compression error to be recovered in a seamless manner. It is supported in both the Traditional and in the Data Plane APIs.

The CnVnR feature is an enhancement of the Compress and Verify (CnV) solution. When a compress and verify error is detected, the Intel<sup>®</sup> QAT software will do a correction without returning a CnV error to the application.

When a recovery occurs, **CpaDcRqResults.status** will return **CPA\_DC\_OK** or **CPA\_DC\_OVERFLOW** and the destination buffer will hold valid deflate data.

The application can find out if CnVnR is supported by querying the instance capabilities via the cpaDc-QueryCapabilities API. On completion, the compressAndVerifyAndRecover property of the CpaDcInstanceCapabilities structure will be set to CPA\_TRUE if the feature is supported.

| Denavors              |                                                                      |  |
|-----------------------|----------------------------------------------------------------------|--|
| API                   | CnVnR Behavior                                                       |  |
| cpaDcCompressData     | Enabled by default, no option to disable.                            |  |
| cpaDcCompressData2    | CnVnR is enabled when <b>compressAndVerifyAndRecover</b> property is |  |
|                       | set to CPA_TRUE in CpaDcDpOpData structure.                          |  |
| cpaDcNsCompressData   | CnVnR is enabled by default.                                         |  |
| cpaDcDecompressData   | Not applicable                                                       |  |
| cpaDcDecompressData2  | Not applicable                                                       |  |
| cpaDcNsDecompressData | Not applicable                                                       |  |
| cpaDcDpEnqueueOp      | CnVnR is enabled when <b>compressAndVerifyAndRecover</b> property is |  |
|                       | set to CPA_TRUE in CpaDcDpOpData structure.                          |  |
| cpaDcDpEnqueueOpBatch | CnVnR is enabled when <b>compressAndVerifyAndRecover</b> property is |  |
|                       | set to CPA_TRUE in CpaDcDpOpData structure.                          |  |

Table 34: Compress and Verify and Recover (CnVnR) Behaviors

When a CnV recovery takes place, the Intel<sup>®</sup> QAT software creates a stored block out of the input payload that could not be compressed. The maximal size of a stored block allowed by the deflate standard is 65,535 bytes.



When a stored block is created, the DEFLATE header specifies that the data is uncompressed so that the decompressor does not attempt to decode the cleartext data that follows the header. The size of a stored block can be defined as: *Stored block size = Source buffer size + 5 Bytes* (used for the deflate header).

The recovery behaves differently on QAT2.0 than on QAT1.X devices. With QAT1.X devices, the recovery creates only one stored block, If the stored block size exceeds 65,535 bytes, the Intel<sup>®</sup> QAT solution creates one stored block of 65,535 bytes and CpaDcRqResults.status returns CPA\_DC\_OVERFLOW. On QAT2.0 device, when the recovery takes place, multiple stored blocks are created. This improvement was added to avoid the application having to handle the overflow.

#### CnV Recovery with LZ4 compression

When LZ4 compression is used, QAT software will generate an uncompressed LZ4 block in the event of a recovery. The LZ4 uncompressed block will have bit <31> set in the block header followed by the cleartext in the data section of the block.

#### CnV Recovery with LZ4s compression

LZ4s algorithm is an Intel<sup>®</sup> specific format. LZ4s payloads do not have a block header like LZ4. When a CnV recovery occurs, the source data will be copied to the destination and **dataUncompressed** property will be set to **CPA\_TRUE** in **CpaDcRqResults** structure.

#### **Counting Recovered Compression Errors**

The Intel<sup>®</sup> QAT API has been updated to allow the application to track recovered compression errors. The **CpaDcStats** data structure has a new property called **numCompCnvErrorsRecovered** that is incremented every time a compression recovery happens.

The compression recovery process is agnostic to the application.

**CpaDcRqResults.status** returns **CPA\_DC\_OK** when a compression recovery takes place. The only way to know if a compression recovery took place on the current request is to call the **cpaDcGetStats()** API and to monitor **CpaDcStats.numCompCnvErrorsRecovered**.

# 6.1.10 Dynamic Compression

Dynamic compression involves feeding the data produced by the compression hardware block to the translator hardware block.



When the application selects the Huffman type to CPA\_DC\_HT\_FULL\_DYNAMIC in the session and autoselect best feature is set to CPA\_DC\_ASB\_DISABLED, the compression service may not always produce a deflate stream with dynamic Huffman trees.

When using QAT2.0 device, it is no longer required to allocate intermediate buffers. The API cpaDcGet-NumIntermediateBuffers() returns 0. As a good programming practice, the application should still call cpaDcGetNumIntermediateBuffers() and ensure that the number of intermediate buffers returned is 0. Compressing some input data may lead to a lower-than-expected compression ratio. This is because the input data may not be very compressible. To achieve a maximum compression ratio, the acceleration unit provides an auto select best (ASB) feature.

With QAT1.X devices, the Intel<sup>®</sup> QuickAssist Technology hardware will first execute static compression followed by dynamic compression and then select the output that yields the best compression ratio.

With QAT2.0 devices, the is different. ASB features chooses between a static and a stored block. ASB features with choose the block type that offers the best compression ratio.

Regardless of the ASB setting selected, dynamic compression will only be attempted if the session is configured for dynamic compression.

| Setting             | Description           |  |  |
|---------------------|-----------------------|--|--|
| CPA_DC_ASB_DISABLED | ASB mode is disabled. |  |  |
| CPA_DC_ASB_ENABLED  | ASB mode is enabled.  |  |  |

#### Table 35: ASB Settings

#### Note:

- Setting ASB mode to CPA\_DC\_ASB\_ENABLED, corresponds to the setting CPA\_DC\_ASB\_UNCOMP\_STATIC\_DYNAMIC\_WITH\_STORED\_HDRS.
- These ASB modes have been deprecated:
  - CPA\_DC\_ASB\_STATIC\_DYNAMIC
  - CPA\_DC\_ASB\_UNCOMP\_STATIC\_DYNAMIC\_WITH\_STORED\_HDRS
  - CPA\_DC\_ASB\_UNCOMP\_STATIC\_DYNAMIC\_WITH\_NO\_HDRS
- Based on the ASB settings, the produced data returned in the **CpaDcRqResults** structure will vary.

# 6.1.12 Maximum Compression Expansion

To facilitate the programming model of the application, Intel<sup>®</sup> added a new set of APIs that return the size of the destination according to the algorithm used.

These APIs are:

- cpaDcDeflateCompressBound()
- cpaDcLZ4CompressBound()
- cpaDcLZ4SCompressBound()

intel

This new set of APIs will return to the application the size of the destination buffer that must be allocated to avoid an overflow exception. Each function initializes the **outputSize** parameter. The **output-Size** parameter takes into account the maximal expansion that the compressed data size can reach. The returned **outputSize** value must be used both to allocate the size of the **pData** and to initialize the **dataLenInBytes** in the **CpaBufferList** structure.

**Note:** Each one of the **compressBound()** APIs accepts as a parameter the instance handle. The instance handle is used internally by the library to determine on which hardware version the instance lives on. The size of the destination buffer to be allocated depends on the hardware generation the instance lives on.

# 6.1.13 No Session API

The no session API is a simplification of the existing compression and decompression APIs that does not require the application to create/remove a session. Instead the parameters that would normally be set when creating a session are passed into the compress/decompress APIs via a **CpaDcNsSetupData** structure. The no session API is especially useful to simplify existing applications as sessions no longer need to be created/tracked/removed. The no session API can be thought of as a "one shot" API and is intended for use cases where all the data to be compressed or decompressed for the current job is being submitted as one request. In addition to the simpler protocol, the API has a smaller memory footprint.

The no session API consists of the following API calls:

- cpaDcNsCompressData()
- cpaDcNsDecompressData()
- cpaDcNsGenerateHeader()
- cpaDcNsGenerateFooter()

The cpaDcNsCompressData() and cpaDcNsDecompressData() functions are very similar to the cpaDcCompressData2() and cpaDcDecompressData2() functions. Instead of passing in a CpaDcSessionHandle, a CpaDcNsSetupData structure and a CpaDcCallbackFn are passed in. The CpaDcCallbackFn is the user callback to be called on request completion when running asynchronously. For synchronous operation CpaDcCallbackFn must be set to NULL.

The no session API will work with all versions of QAT hardware but does not support stateful operation as without a session no state can be maintained between requests.

It is still possible to seed checksums for CRC32 and Adler32 by setting the **checksum** field of the **CpaD-cRqResults** to the seed checksum value before submission. This will allow an overall checksum to be maintained across multiple submissions. For LZ4, checksum seeding is not supported. If checksums need to be maintained between LZ4 requests then the session based API must be used. The no session API supports data integrity checksums but as stateful operation is not supported the integrity checksums will always be for the current request only.

The no session API does support stateless overflow in the compression direction only like the session based API. In that case **consumed**, **produced** and **checksum** fields within the **CpaDcRqResults** structure will be valid when a status of **CPA\_DC\_OVERFLOW** is returned. It is the application's responsibility to arrange data

buffers for the next submission, ensure the checksum is seeded and maintain an overall count of the bytes consumed if footer generation is required. For performance reasons it is recommended that the compressBound API is used to size the destination buffer correctly to avoid overflow. It is necessary for an application to seed the checksum whether it wishes to continue the series of requests or to start a new one, only in the latter case, the seed is 0 for CRC32 and 1 for Adler32.

The no session API does not support setting the **sessDirection** field of the **CpaDcNsSetupData** structure to **CPA\_DC\_DIR\_COMBINED**. In addition, it does not support setting the **sessState** field to **CPA\_DC\_STATEFUL**, or the **flushFlag** field of the **CpaDcOpData** structure to **CPA\_DC\_FLUSH\_NONE** or **CPA\_DC\_FLUSH\_SYNC**.

The cpaDcNsGenerateHeader() and cpaDcNsGenerateFooter() functions are also very similar to the session based equivalents but take a CpaDcNsSetupData instead of a CpaDcSessionHandle. For cpaD-cNsGenerateFooter() an additional parameter is required called count that contains the overall length of the uncompressed input data. For most cases this will be the consumed value from the single submission contained in the CpaDcRqResults structure but in cases where multiple submissions represent the overall file then it is the application's responsibility to maintain the overall count of consumed bytes.

# 6.1.14 Compression Levels

| Level | lvl_enum   | QAT 2.0 (Deflate, iLZ77, LZ4, LZ4s)       | QAT 1.7/1.8 (Deflate) |
|-------|------------|-------------------------------------------|-----------------------|
| 1     | CPA_DC_L1  | 2(HW_L1)                                  | DEPTH_1               |
| 2     | CPA_DC_L2  |                                           | DEPTH_4               |
| 3     | CPA_DC_L3  |                                           | DEPTH_8               |
| 4     | CPA_DC_L4  |                                           | DEPTH_16              |
| 5     | CPA_DC_L5  |                                           |                       |
| 6     | CPA_DC_L6  | 8(Hw_L6)                                  |                       |
| 7     | CPA_DC_L7  |                                           |                       |
| 8     | CPA_DC_L8  |                                           |                       |
| 9     | CPA_DC_L9  | 16 (Hw_L9)                                |                       |
| 10    | CPA_DC_L10 |                                           | Unsupported. Will be  |
| 11    | CPA_DC_L11 |                                           | rejected at the API.  |
| 12    | CPA_DC_L12 |                                           |                       |
| >12   |            | Unsupported. Will be rejected at the API. |                       |

Table 36: Compression Levels

# 6.1.15 Compression Status Codes

The CpaDcRqResults structure should be checked for compression status codes in the CpaDcReqStatus data field. The mapping of the error codes to the enums is included in the quickassist/include/dc/ cpa\_dc.h file.

# 6.1.16 Intel<sup>®</sup> QuickAssist Technology Compression API Errors

The Intel<sup>®</sup> QuickAssist Technology Compression APIs that send requests to the compression hardware can return the error codes shown in *Compression API Errors*.

These APIs are:

- cpaDcCompressData()
- cpaDcCompressData2()
- cpaDcNsCompressData()
- cpaDcDecompressData()
- cpaDcDecompressData2()
- cpaDcNsDecompressData()
- cpaDcDpEnqueueOp()
- cpaDcDpEnqueueOpBatch()

**Note:** Decompression issues in may also apply to the compression use case due to potential issues encountered during a Compress-and-Verify operation. In this case, the file(s) /sys/kernel/debug/qat\_\*/ cnv\_errors may show these nested errors. In some cases, the suggested corrective action may need to be to store the block uncompressed or to compress the block with software.

### 6.1.16.1 Compression API Errors

| Error<br>Code | Error Type                | Description                                                                               | Suggested Corrective<br>Action(s)                                                                                                                                                       |
|---------------|---------------------------|-------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 0             | CPA_DC_OK                 | No error detected by compression hard-<br>ware.                                           | None.                                                                                                                                                                                   |
| -1            | CPA_DC_INVALID_BLOCK_TYPE | Invalid block type<br>(type = 3); invalid in-<br>put stream detected<br>for decompression | Decompression error.<br>Discard output. For<br>a stateless session,<br>resubmit affected<br>request. For a stateful<br>session, abort the<br>session calling CpaD-<br>cRemoveSession(). |

Table 37: Compression API Errors

|       | Table 37 – continued from previous page |                       |                         |  |
|-------|-----------------------------------------|-----------------------|-------------------------|--|
| Error | Error Type                              | Description           | Suggested Corrective    |  |
| Code  |                                         |                       | Action(s)               |  |
| -2    | CPA_DC_BAD_STORED_BLOCK_LEN             | Stored block length   | Decompression error.    |  |
|       |                                         | did not match one's   | Discard output. For     |  |
|       |                                         | complement; invalid   | a stateless session,    |  |
|       |                                         | input stream detected | resubmit affected       |  |
|       |                                         |                       | request. For a stateful |  |
|       |                                         |                       | session, abort the      |  |
|       |                                         |                       | session calling CpaD-   |  |
|       |                                         |                       | cRemoveSession().       |  |
| -3    | CPA_DC_TOO_MANY_CODES                   | Too many length       | Decompression error.    |  |
|       |                                         | or distance codes;    | Discard output. For     |  |
|       |                                         | invalid input stream  | a stateless session,    |  |
|       |                                         | detected              | resubmit affected       |  |
|       |                                         |                       | request. For a stateful |  |
|       |                                         |                       | session, abort the      |  |
|       |                                         |                       | session calling CpaD-   |  |
|       |                                         |                       | cRemoveSession().       |  |
| -4    | CPA_DC_INCOMPLETE_CODE_LENS             | Code length codes in- | Decompression error.    |  |
|       |                                         | complete: invalid in- | Discard output. For     |  |
|       |                                         | put stream detected   | a stateless session,    |  |
|       |                                         |                       | resubmit affected       |  |
|       |                                         |                       | request. For a stateful |  |
|       |                                         |                       | session, abort the      |  |
|       |                                         |                       | session calling CpaD-   |  |
|       |                                         |                       | cRemoveSession().       |  |
| -5    | CPA_DC_REPEATED_LENS                    | Repeated lengths      | Decompression error.    |  |
|       |                                         | with no first length; | Discard output. For     |  |
|       |                                         | invalid input stream  | a stateless session,    |  |
|       |                                         | detected              | resubmit affected       |  |
|       |                                         |                       | request. For a stateful |  |
|       |                                         |                       | session, abort the      |  |
|       |                                         |                       | session calling CpaD-   |  |
|       |                                         |                       | cRemoveSession().       |  |
| -6    | CPA_DC_MORE_REPEAT                      | Repeat more than      | Decompression error.    |  |
|       |                                         | specified lengths;    | Discard output. For     |  |
|       |                                         | invalid input stream  | a stateless session,    |  |
|       |                                         | detected              | resubmit affected       |  |
|       |                                         |                       | request. For a stateful |  |
|       |                                         |                       | session, abort the      |  |
|       |                                         |                       | session calling CpaD-   |  |
|       |                                         |                       | cRemoveSession().       |  |

Table 37 – continued from previous page

# intel.

| Error | Error Type              | Description                                                                                                      | Suggested Corrective                                                                                                                                                                                                                                                                                                          |
|-------|-------------------------|------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Code  |                         | •                                                                                                                | Action(s)                                                                                                                                                                                                                                                                                                                     |
| -7    | CPA_DC_BAD_LITLEN_CODES | Invalid literal/length<br>code lengths; invalid<br>input stream detected                                         | Decompression error.<br>Discard output. For<br>a stateless session,<br>resubmit affected<br>request. For a stateful<br>session, abort the<br>session calling CpaD-<br>cRemoveSession().                                                                                                                                       |
| -8    | CPA_DC_BAD_DIST_CODES   | Invalid distance code<br>lengths; invalid input<br>stream detected                                               | Decompression error.<br>Discard output. For<br>a stateless session,<br>resubmit affected<br>request. For a stateful<br>session, abort the<br>session calling CpaD-<br>cRemoveSession().                                                                                                                                       |
| -9    | CPA_DC_INVALID_CODE     | Invalid literal/length or<br>distance code in fixed<br>or dynamic block; in-<br>valid input stream de-<br>tected | Decompression error.<br>Discard output. For<br>a stateless session,<br>resubmit affected<br>request. For a stateful<br>session, abort the<br>session calling CpaD-<br>cRemoveSession().                                                                                                                                       |
| -10   | CPA_DC_INVALID_DIST     | Distance is too far<br>back in fixed or dy-<br>namic block; invalid<br>input stream detected                     | Decompression error.<br>Discard output. For<br>a stateless session,<br>resubmit affected<br>request. For a stateful<br>session, abort the<br>session calling CpaD-<br>cRemoveSession(). If<br>the error is observed<br>with LZ4 decompres-<br>sion, ensure that the<br>compressor has a<br>history buffer limited<br>to 32KB. |

Table 37 – continued from previous page

| Error | Error Type             | nued from previous page Description                                                                                                                                                                                                                                                                                                       | Suggested Corrective                                                                                                                                                                                                     |
|-------|------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Code  | Endrigpe               | Description                                                                                                                                                                                                                                                                                                                               | Action(s)                                                                                                                                                                                                                |
| -11   | CPA_DC_OVERFLOW        | Overflow detected.<br>This is not an error,<br>but an exception.<br>Overflow is supported<br>and can be handled.                                                                                                                                                                                                                          | Resubmit with a larger<br>output buffer when<br>appropriate.<br>With decompression<br>executed on QAT2.0,<br>the application is re-<br>quired to resubmit the<br>compressed data with<br>a larger destination<br>buffer. |
| -12   | CPA_DC_SOFTERR         | Other non-fatal de-<br>tected.                                                                                                                                                                                                                                                                                                            | Discard output. For<br>a stateless session,<br>resubmit affected<br>request. For a stateful<br>session, abort the<br>session calling CpaD-<br>CRemoveSession().                                                          |
| -13   | CPA_DC_FATALERR        | Fatal error detected.                                                                                                                                                                                                                                                                                                                     | Discard output and<br>abort the session<br>calling CpaDcRe-<br>moveSession().                                                                                                                                            |
| -14   | CPA_DC_MAX_RESUBMITERR | On an error being de-<br>tected, the firmware<br>attempted to cor-<br>rect and resubmitted<br>the request, how-<br>ever, the maximum<br>resubmit value was<br>exceeded. Maximal<br>value is internally set<br>in the firmware to 10<br>attempts. This is a<br>QAT1.6 error only.<br>This error code is<br>considered as a fatal<br>error. | Discard output and<br>abort the session<br>calling CpaDcRe-<br>moveSession().                                                                                                                                            |

| Error | Error Type                 | Description                                     | Suggested Corrective           |
|-------|----------------------------|-------------------------------------------------|--------------------------------|
| Code  |                            |                                                 | Action(s)                      |
| -15   | CPA_DC_INCOMPLETE_FILE_ERR | QAT1.X device can re-                           | No corrective action is        |
|       |                            | port this error with De-                        | required as it is not ex-      |
|       |                            | flate decompression.                            | posed to the applica-          |
|       |                            | However, it is not                              | tion.                          |
|       |                            | exposed to the appli-                           |                                |
|       |                            | cation. The input file is                       |                                |
|       |                            | incomplete. This indi-                          |                                |
|       |                            | cates that the request                          |                                |
|       |                            | was submitted with a                            |                                |
|       |                            | CPA_DC_FLUSH_FIN                                | AL.                            |
|       |                            | However, a BFINAL                               |                                |
|       |                            | bit was not found in                            |                                |
|       |                            | the request.                                    |                                |
|       |                            | QAT2.0 can return                               |                                |
|       |                            | this error code to the                          |                                |
|       |                            | application during                              |                                |
|       |                            | LZ4 decompression.                              |                                |
|       |                            | This error is returned                          |                                |
|       |                            | when a LZ4 block is                             |                                |
|       |                            | incomplete.                                     |                                |
| -16   | CPA_DC_WDOG_TIMER_ERR      | The request was                                 | Discard output and             |
|       |                            | not completed as a                              | resubmit the affected          |
|       |                            | watchdog timer hard-                            | request.                       |
|       |                            | ware event occurred.                            |                                |
|       |                            | With QAT2.0 this                                |                                |
|       |                            | error can be triggered                          |                                |
|       |                            | by an internal parity                           |                                |
| 17    |                            | error.                                          | Discourse and the state of the |
| -17   | CPA_DC_EP_HARDWARE         | This is a recoverable                           |                                |
|       |                            | error. Request was                              |                                |
|       |                            | not completed as an                             | calling CpaDcRe-               |
|       |                            | end point hardware er-<br>ror occurred (for ex- | <pre>moveSession().</pre>      |
|       |                            |                                                 |                                |
| -18   |                            | ample, a parity error).                         | Discord outputs in             |
| -10   | CPA_DC_VERIFY_ERR          | Compress and Verify<br>(CnV). This is a com-    | Discard output; re-            |
|       |                            | pression direction er-                          | submit affected                |
|       |                            |                                                 | request.                       |
|       |                            | ror only. During the                            |                                |
|       |                            | decompression of the compressed payload,        |                                |
|       |                            | an error was detected                           |                                |
|       |                            | and the deflate block                           |                                |
|       |                            |                                                 |                                |
|       |                            | produced is invalid.                            |                                |

Table 37 – continued from previous page

|       | Table 37 – continued from previous page |                         |                      |  |  |  |  |
|-------|-----------------------------------------|-------------------------|----------------------|--|--|--|--|
| Error | Error Type                              | Description             | Suggested Corrective |  |  |  |  |
| Code  |                                         |                         | Action(s)            |  |  |  |  |
| -19   | CPA_DC_EMPTY_DYM_BLK                    | Decompression re-       | Discard output.      |  |  |  |  |
|       |                                         | quest contained         |                      |  |  |  |  |
|       |                                         | an empty dynamic        |                      |  |  |  |  |
|       |                                         | stored block (not       |                      |  |  |  |  |
|       |                                         | supported).             |                      |  |  |  |  |
| -20   | CPA_DC_CRC_INTEG_ERR                    | A data integrity CRC    | Discard output.      |  |  |  |  |
|       |                                         | error was detected.     |                      |  |  |  |  |
| -93   | CPA_DC_LZ4_MAX_BLOCK_SIZE_EXCEEDED      | LZ4 max block size ex-  | Discard output.      |  |  |  |  |
|       |                                         | ceeded.                 |                      |  |  |  |  |
| -95   | CPA_DC_LZ4_BLOCK_OVERFLOW_ERR           | LZ4 block overflow.     | Discard output.      |  |  |  |  |
| -98   | CPA_DC_LZ4_TOKEN_IS_ZERO_ERR            | LZ4 Decoded token       | Discard output.      |  |  |  |  |
|       |                                         | offset or token length  |                      |  |  |  |  |
|       |                                         | is zero.                |                      |  |  |  |  |
| -100  | CPA_DC_LZ4_DISTANCE_OUT_OF_RANGE_ERR    | LZ4 distance out of     | Discard output.      |  |  |  |  |
|       |                                         | range for the len/ dis- |                      |  |  |  |  |
|       |                                         | tance pair.             |                      |  |  |  |  |

#### Table 37 – continued from previous page

#### Note:

- Except for the errors CPA\_DC\_OK, CPA\_DC\_OVERFLOW, CPA\_DC\_FATALERR, CPA\_DC\_MAX\_RESUBMITERR, CPA\_DC\_WDOG\_TIMER\_ERR, CPA\_DC\_VERIFY\_ERR, and CPA\_DC\_EP\_HARDWARE\_ERR, the rest of the error codes can be considered as invalid input stream errors.
- When the suggested corrective action is to discard the output, it implies that the application must also ignore the consumed data, the produced data, and the checksum values.

# 6.1.17 Overflows Errors

This table describes the behavior of the Intel<sup>®</sup> QAT compression service when an overflow occurs during a compression or decompression operation. It also describes the expected behavior of an application when an overflow occurs.

|                    | Operation                       | Overflow<br>Supported                                    | Input Data<br>Consumed                                     | Valid Data<br>Produced?                                    | Status Re-<br>turned in<br>Results | Note                                                                                                                                           |
|--------------------|---------------------------------|----------------------------------------------------------|------------------------------------------------------------|------------------------------------------------------------|------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------|
| Traditional<br>API | Stateless<br>compres-<br>sion   | YES                                                      | Possible -<br>indicated<br>in results<br>consumed<br>field | Possible -<br>indicated<br>in results<br>produced<br>field | -11                                | Overflow<br>is consid-<br>ered as an<br>exception                                                                                              |
|                    | Stateless<br>decom-<br>pression | NO                                                       | NO                                                         | NO                                                         | -11                                | Overflow is considered as an error                                                                                                             |
|                    | Stateful<br>decom-<br>pression  | YES on<br>QAT1.x de-<br>vices NO<br>on QAT2.x<br>devices | Possible -<br>indicated<br>in results<br>consumed<br>field | Possible -<br>indicated<br>in results<br>produced<br>field | -11                                | Overflow<br>is consid-<br>ered as an<br>exception<br>on QAT1.x<br>devices.<br>QAT2.x<br>does not<br>support<br>stateful<br>decom-<br>pression. |
| Data Plane<br>API  | Stateless<br>compres-<br>sion   | NO                                                       | NO                                                         | NO                                                         | -11                                | Overflow is<br>considered<br>as an error                                                                                                       |
|                    | Stateful<br>decom-<br>pression  | NO                                                       | NO                                                         | NO                                                         | -11                                | Overflow is considered as an error                                                                                                             |

Table 38: Overflows Errors

## 6.1.17.1 Traditional API Overflow Exception

Stateless sessions support overflow as an exception for traditional API in the compression direction only. This means that the application can rely on the **cpaDcRqResults.consumed** to resubmit from where the overflow occurred. An overflow in the decompression direction must be treated as an error.

In this case, the application must resubmit the request with a larger buffer as described in the procedure for handling overflow errors. For stateful sessions, overflow is supported only in the decompression direction.

### 6.1.17.2 Data Plane API Overflow Error

The Data Plane API considers overflow status as an error. If an overflow occurs with the data plane API, the driver will output the following error message to the user:

Unrecoverable error: stateless overflow. You may need to increase the size of your destination buffer

In this case, **cpaDcRqResults.consumed**, **.produced** and **.checksum** should be ignored. If length and checksum are required, they must be tracked in the application, because they are not maintained in the session.

### 6.1.17.3 Handling Overflow Errors

Resubmit the request with the following data:

- Use the same source buffer.
- Allocate a bigger destination buffer. It is recommended to use the **compressBound()** APIs in compression direction.
- If the overall checksum needs to be maintained, insert the checksum from the previous successful request into the cpaDcRqResults struct.

### 6.1.17.4 Compression Overflows in a Virtual Environment

In a virtual environment, the guest does not download the firmware. Only the host downloads the firmware. As a consequence, if the guest runs a newer Intel<sup>®</sup> QAT driver than the host, the guest application might experience false CNV errors. The correct course of action would be to update the host with the latest Intel<sup>®</sup> QAT driver.

### 6.1.17.5 Avoiding Compression Overflow Exceptions

Overflow exceptions happen for 2 reasons:

- 1. The application allocated a destination buffer that was too small to receive the compressed data.
- 2. A recovery occurred after a compress and verify error with an input payload greater than 65,535 bytes if the instance lives on a QAT1.X device.

To minimize the impact of resubmitting data after and overflow exception, the API **cpaDcDeflateCompressBound()** has been added to the Intel<sup>®</sup> QAT driver. A detailed explanation of *compressBound* APIs is provided in the *Maximum Compression Expansion* section.

# 6.1.18 Integrity Checksums

Integrity checksums are an additional method for payload verification throughout the compression/decompression lifecycle. They may be used to verify corruption has not happened when sending data to and from the Intel<sup>®</sup> QuickAssist HW, or for example the integrity checksums may be stored by an application along with the compressed data and used to detect corruption in the future without needing to decompress the data.

They should not be confused with the Gzip/Zlib/LZ4 footer checksums of CRC32, Adler32 and XXHash32 that are calculated over the uncompressed input data only.

Integrity checksums use an additional structure that is the application's responsibility to allocate, maintain, and free. The structure must be allocated by the user, and the QAT Library will populate it based on values returned by the hardware. The structure is **cpaCrcData** and contains the following fields:

| cpaCrcData Fields   | Description                                                                   |
|---------------------|-------------------------------------------------------------------------------|
| Cpa32U crc32        | This is the existing CRC32 for the footer calculated across the uncom-        |
|                     | pressed data in either the source or dest buffer depending whether it is a    |
|                     | compress or decompress operation. This is the same as the value returned      |
|                     | in the cpaDcRqResults checksum field.                                         |
| Cpu32U adler32      | This is the existing Adler32/XXHash32 for the footer calculated across the    |
|                     | uncompressed data in either the source or dest buffer depending whether       |
|                     | it is a compress or decompress operation. This is the same as the value re-   |
|                     | turned in the cpaDcRqResults checksum field.                                  |
| CpaIntegrityCrc in- | This field contains the QAT 1.8 integrity checksums that consist of two 32bit |
| tegrityCrc          | CRC32's. These are calculated on the input data to the request within the     |
|                     | HW and on the output data from the request within the HW.                     |
| CpaIntegrityCrc64b  | This field contains the QAT 2.0 integrity checksums that consist of two 64    |
| integrityCrc64b     | bit checksums. CPM 2.0 uses CRC64 by default for these checksums. The         |
|                     | checksums are calculated on the input data to the request within the HW       |
|                     | and on the output data from the request within the HW.                        |

#### Table 39: Integrity Checksums

- QAT will generate both CRC32 and CRC64. The CRC32 generated is the standard CRC32 as defined in the GZIP RFC (RFC 1952). All compression algorithms support CRC generation in 4th Gen Intel® Xeon® Scalable processor and later CPUs.
- The CRC attributes of integrityCrc64b can be modified by the cpaDcSetCrcControlData() API call if sessions are used, meaning it's mapped to QAT's programmable CRC feature. If the cpaD-cSetCrcControlData API is not used, the QAT driver uses default CpaCrcControlData. The CRC32/Adler32 for deflate's CRC attributes are fixed.
- Once allocated, a pointer to the **cpaCrcData** structure must be assigned to the **pCrcData** field of the requests **CpaDcOpData** structure.
- The cpaCrcData structure assigned to the pCrcData pointer should be treated in the same way as the destination buffer, not freed until the request has completed, and not shared across requests if running asynchronously.

- The integrity checksum feature itself is enabled on a per request basis by setting the **integrityCr-cCheck** field contained in the **CpaDcOpData** structure to **CPA\_TRUE**.
- For decompression jobs, the application can verify the CRC32/Adler32 returned by QAT against the CRC32/Adler32 in the GZIP/ZLIB frame as an integrity verification.
- Integrity checksums are available on the Traditional API including No Session requests, but are not available on the Data Plane API.
- Integrity checksums are calculated across only the current request in QAT 2.0. With QAT 1.8 it is possible to seed the integrity checksums on stateful decompression requests by reusing the same **cpaCrcData** structure on the subsequent request without resetting the contents. For QAT 1.8 stateful decompression requests it is the application's responsibility to allocate the **cpaCrcData** structure and keep it allocated for the lifetime of the session. Integrity checksums are not available on devices prior to QAT 1.8.

### 6.1.18.1 Verify HW Integrity CRC's

There is an additional feature to integrity checksums that can be enabled to automatically check that no corruption to data buffers has occurred during transport to and from the Intel<sup>®</sup> QAT HW. This works by calculating integrity checksums across the source and destination buffers within the Intel<sup>®</sup> QAT API, and comparing the checksums with those generated within the Intel<sup>®</sup> QAT HW. Any discrepancies will result in a status of CPA\_DC\_INTEG\_ERR being returned within the cpaDcRqResults structure. These additional checksums are calculated in SW using the CPU and have a cost in terms of performance. In order to enable the *Verify HW Integrity CRC* feature on a per request basis the verifyHwIntegrityCrcs field contained in the cpaDcOpdata structure needs to be set to CPA\_TRUE. Additionally the integrityCrcCheck field must be enabled and a cpaCrcData structure allocated and a pointer to it must be assigned to the pCrcData field.

# 6.1.19 Data Compression Applications

Data compression can be used as part of application delivery networks, data de-duplication, as well as in a number of crypto applications, for example, VPNs, IDS/IPS and so on.

### 6.1.19.1 Compression for Storage

In a time when the amount of online information is increasing dramatically, but budgets for storing that information remain static, compression technology is a powerful tool for improved information management, protection and access.

Compression appliances can transparently compress data such that clients can keep between two- and five-times more data online and reap the benefit of other efficiencies throughout the data lifecycle. By shrinking the primary data, all subsequent copies of that data, such as backups, archives, snapshots, and replicas are also compressed. Compression is the newest advancement in storage efficiency.

Storage compression appliances can shrink primary online data in real time, without performance degradation. This can significantly lower storage capital and operating expenses by reducing the amount of data that is stored, and the required hardware that must be powered and cooled.

Compression can help slow the growth of storage, reducing storage costs while simplifying both operations and management. It also enables organizations to keep more data available for use, as opposed to storing data offsite or on harder-to-access media (such as tape). Compression algorithms are very compute-intensive, which is one of the reasons why the adoption of compression techniques in mainstream applications has been slow.

As an example, the DEFLATE Algorithm, which is one of the most used and popular compression techniques today, involves several compute-intensive steps: string search and match, sort logic, binary tree generation, Huffman Code generation. Intel<sup>®</sup> QAT devices in the platforms described in this manual provide acceleration capabilities in hardware that allow the CPU to offload the compute-intensive DEFLATE algorithm operations, thereby freeing up CPU cycles for other networking, encryption, or other value-add operations.

### 6.1.19.2 Data Deduplication and WAN Acceleration

Data Deduplication and WAN Acceleration are coarse-grain data compression techniques centered around the concept of single-instance storage. Identical blocks of data (either to be stored on disk or to be transferred across a WAN link) are only stored/moved once, and any further occurrences are replaced by a reference to the first instance.

While the benefits of deduplication and WAN acceleration obviously depend on the type of data, multiuser collaborative environments are the most suitable due to the amount of naturally occurring replication caused by forwarded emails and multiple (similar) versions of documents in various stages of development.

Deduplication strategies can vary in terms of inline vs post-processing, block size granularity (file-level only, fixed block size or variable block-size chunking), duplicate identification (cryptographic hash only, simple CRC followed by byte-level comparison or hybrids) and duplicate look-up (for example, Bloom filter based index).

Cryptographic hashes are the most suitable techniques for reliably identifying matching blocks with an improbably low risk for false positives, but they also represent the most compute-intensive workload in the application. As such, the cryptographic acceleration services offered by the hardware through the Intel<sup>®</sup> QAT Cryptographic API can be used to considerably improve the throughput of deduplication/WAN acceleration applications. Additionally, the compression/decompression acceleration services can be used to further compress blocks for storage on disk, while optionally encrypting the compressed contents.

# 6.2 Cryptographic Services

# 6.2.1 Introduction

Intel<sup>®</sup> QuickAssist Technology (Intel<sup>®</sup> QAT) accelerates cryptographic workloads by offloading the data to hardware capable of optimizing those functions. This makes it easier for developers to integrate built-in cryptographic accelerators into network and security applications.

Symmetric cryptography algorithms include:

- Cipher operations (AES, DES, 3DES, ARC4, CHA-CHA, SM4).
- Wireless (Kasumi, Snow, 3G).
- Hash/Authenticate operations (SHA-1, MD5, SHA-2, SHA-3, SHAKE).
- Authentication (HMAC, AES-XCBC, AES-CCM).

Public key algorithms include:

- RSA operation.
- Diffie-Hellman operation.
- Digital signature standard operation.
- Key derivation operation.
- Elliptic curve cryptography (ECDSA and ECDH).
- Prime number testing.

### 6.2.1.1 Supported Cipher Algorithms

The following table provides details on supported cipher algorithms for each platform.

#### Note:

- **cpaCySymInitSession()** returns error status of **CPA\_STATUS\_UNSUPPORTED** if cipher algorithm is not supported.
- The QAT2.0 driver has not been updated to enable the Opt-In functionality. This will be added in a future release.

| Tuble To. Supported Sipiler / ligoritinins |          |         |         |  |
|--------------------------------------------|----------|---------|---------|--|
| Algorithm                                  | QAT 1.7x | QAT 1.8 | QAT 2.0 |  |
| NULL                                       | Yes      | Yes     | Yes     |  |
| ARC4                                       | Opt-in   | Opt-in  | No      |  |
| AES-ECB                                    | Opt-in   | Opt-in  | Opt-in  |  |
| continues on next page                     |          |         |         |  |

| Algorithm   | QAT 1.7x | QAT 1.8 | QAT 2.0 |
|-------------|----------|---------|---------|
| AES-CBC     | Yes      | Yes     | Yes     |
| AES-CTR     | Yes      | Yes     | Yes     |
| AES-CCM     | Yes      | Yes     | Yes     |
| AES-GCM     | Yes      | Yes     | Yes     |
| AES-F8      | Opt-in   | Opt-in  | Opt-in  |
| AES-XTS     | Yes      | Yes     | Yes     |
| DES-ECB     | Opt-in   | Opt-in  | No      |
| DES-CBC     | Opt-in   | Opt-in  | No      |
| 3DES-ECB    | Opt-in   | Opt-in  | No      |
| 3DES-CBC    | Opt-in   | Opt-in  | No      |
| 3DES-CTR    | Opt-in   | Opt-in  | No      |
| KASUMI-F8   | Yes      | Yes     | No      |
| SNOW3G-UEA2 | Yes      | Yes     | No      |
| ZUC-EEA3    | Yes      | Yes     | No      |
| СНАСНА      | No       | Yes     | Yes     |
| SM4-ECB     | No       | Opt-in  | Opt-in  |
| SM4-CBC     | No       | Yes     | Yes     |
| SM4-CTR     | No       | Yes     | Yes     |

Table 40 – continued from previous page

### 6.2.1.2 Supported Hash/Authenticate Algorithms

The following table provides details on supported hash algorithms for each platform.

#### Note:

 cpaCySymInitSession() returns error status of CPA\_STATUS\_UNSUPPORTED if hash algorithm is not supported.

| Algorithm | QAT 1.7x | QAT 1.8 | QAT 2.0 |  |  |
|-----------|----------|---------|---------|--|--|
| MD5       | Opt-in   | Opt-in  | No      |  |  |
| SHA1      | Opt-in   | Opt-in  | Opt-in  |  |  |
| SHA224    | Opt-in   | Opt-in  | Opt-in  |  |  |
| SHA256    | Yes      | Yes     | Yes     |  |  |
| SHA384    | Yes      | Yes     | Yes     |  |  |
| SHA512    | Yes      | Yes     | Yes     |  |  |
| SHA3-224  | No       | Opt-in  | Opt-in  |  |  |
| SHA3-256  | Yes      | Yes     | Yes     |  |  |
| SHA3-384  | No       | Yes     | Yes     |  |  |
| SHA3-512  | No       | Yes     | Yes     |  |  |
| AES-XCBC  | Yes      | Yes     | Yes     |  |  |
|           |          |         |         |  |  |

Table 41: Supported Hash/Authenticate Algorithms

| Algorithm   | QAT 1.7x | QAT 1.8 | QAT 2.0 |  |  |  |
|-------------|----------|---------|---------|--|--|--|
| AES-CBC_MAC | Yes      | Yes     | Yes     |  |  |  |
| AES-CCM     | Yes      | Yes     | Yes     |  |  |  |
| AES-GCM     | Yes      | Yes     | Yes     |  |  |  |
| AES-GMAC    | Yes      | Yes     | Yes     |  |  |  |
| AES-CMAC    | Yes      | Yes     | Yes     |  |  |  |
| KASUMI-F9   | Yes      | Yes     | No      |  |  |  |
| SNOW3G-UIA2 | Yes      | Yes     | No      |  |  |  |
| ZUC-EIA3    | Yes      | Yes     | No      |  |  |  |
| POLY        | No       | Yes     | Yes     |  |  |  |
| SM3         | No       | Yes     | Yes     |  |  |  |

Table 41 – continued from previous page

### 6.2.1.3 Supported Public Key Algorithms

The following table provides details on supported asymmetric algorithms for each platform.

**Note:** QAT Public Key functions will return error status of **CPA\_STATUS\_UNSUPPORTED** if algorithm is not supported.

|                     |          | , ,     |         |
|---------------------|----------|---------|---------|
| Algorithm           | QAT 1.7x | QAT 1.8 | QAT 2.0 |
| RSA-1024            | Opt-in   | Opt-in  | Opt-in  |
| RSA-2048            | Yes      | Yes     | Yes     |
| RSA-3072            | Yes      | Yes     | Yes     |
| RSA-4096            | Yes      | Yes     | Yes     |
| RSA-8192            | No       | No      | Yes     |
| SM2                 | No       | Yes     | Yes     |
| ECDH Point Multiply | Yes      | Yes     | Yes     |
| ECDSA Sign          | Yes      | Yes     | Yes     |
| ECDSA Verify        | Yes      | Yes     | Yes     |
| x25519              | Yes      | Yes     | Yes     |
| x448                | Yes      | Yes     | Yes     |

Table 42: Supported Public Key Algorithms

# 6.2.2 Cryptography Applications

Cryptography applications supported by the platforms described in this manual include, but are not limited to:

- Virtual Private Networks
- Encrypted Storage

Web Proxy Appliances

### 6.2.2.1 IPsec and SSL VPNs

Virtual Private Networks (VPNs) allow for private networks to be established over the public Internet by providing confidentiality, integrity and authentication using cryptography. VPN functionality can be provided by a standalone security gateway box at the boundary between the trusted and untrusted networks. It is also commonly combined with other networking and security functionality in a security appliance, or even in standard routers.

VPNs are typically based on one of two cryptographic protocols, either IPsec or Datagram Transport Layer Security (DTLS). Each has its advantages and disadvantages.

One of the most compute-intensive aspects of a VPN is the cryptographic processing required to encrypt/decrypt traffic for confidentiality, to perform cryptographic hash functionality for authentication and to perform public key cryptography, based on modular exponentiation of large numbers or elliptic curve cryptography as part of key negotiation and exchange. The accelerator provides cryptographic acceleration that can offload this computation from the CPU, thereby freeing up CPU cycles to perform other networking, encryption, or other value-add applications.

The Intel<sup>®</sup> QAT Endpoint offers its acceleration services through an API, called the Intel<sup>®</sup> QAT Cryptographic API. This can be invoked from the Linux\* kernel or from Linux\* user space as well as from other operating systems. Intel<sup>®</sup> also provides plugins to enable many of the PCH's cryptographic services to be accessed through open source cryptographic frameworks, such as the Linux\* kernel crypto framework/API (also known as the **scatterlist** API) and OpenSSL\* libcrypto\* (through its EVP API). This facilitates ease of integration with certain open source implementations of protocol stacks, such as the Linux\* kernel's native IPsec stack (called **NETKEY**) or with OpenVPN\* (an open source SSL VPN implementation).

### 6.2.2.2 Encrypted Storage

In recent years, cases of lost laptops containing sensitive information have made the headlines all too frequently. Full disk encryption has become a standard procedure for many corporate PCs. Safe-guarding critical data however is not just a necessity in the client space, it is also a necessity in the data center.

Enterprise-class storage appliances achieve throughput rates in excess of 50 Gbps. Several high-profile cases of data theft have triggered updates to government regulations and industry standards. These regulations/standards now require protection of data-at-rest for applications involving sensitive data such as medical and financial records, typically using strong encryption. The high computational cost of adding encryption to storage appliances makes offload solutions an attractive value proposition.

Several complimentary standards exist for the encryption of data-at-rest, which, when combined with traditional network security protocols such as IPsec or SSL/TLS, provide an end-to-end encrypted storage solution, even for data-in-flight.

The IEEE\* Security in Storage working group is developing the IEEE 1619 series of standards that deal with cipher algorithms for disk and tape storage devices (AES in CCM and GCM modes). The cryptographic acceleration services of platforms that use the Intel<sup>®</sup> QAT Endpoints are ideally suited for long-

term encrypted storage solutions implementing the IEEE 1619.1 standard, by providing acceleration of the AES-256 cipher in CBC, CCM, and GCM modes and HMAC authentication using SHA-1, SHA-256 and SHA-512 hashes.

The Trusted Computing Group's (TCG) Storage Working Group does not prescribe a particular set of algorithms for the disk encryption. Instead, it defines several Storage Subsystem Classes (SSC) for various usage models, which define services such as enrollment and connection, protected storage (an extension of Trusted Platform Module (TPM)), locking, logging, cryptographic services, authorization, and firmware updates. The cryptographic acceleration services of the platform can help by providing the highest level of encryption for authenticating the host to trusted peripherals implementing the TCG storage standards.

### 6.2.2.3 Web Proxy Appliances

Historically, Web Proxy appliances have evolved to present a public or intermediary interface for clients seeking resources from other servers, providing services such as web page caching and load balancing. These appliances are located at the edge of the network, typically at network gateways. Due to their centralized presence in the network, Web Proxy appliances today (referred to with several different names, such as Application Delivery Controllers, Reverse Proxy, and so on) have become a collection of services that include:

- Application Load Balancing (L4-L7)
- SSL Acceleration
- WAN Acceleration
- Caching
- Traffic Management
- Web Application Firewall

SSL and WAN acceleration have become common place capabilities of the Web Proxy appliance, requiring compute intensive algorithms for cryptography (SSL) and compression (WAN acceleration). Intel<sup>®</sup> QAT devices on the platforms described in this manual provide acceleration of asymmetric cryptography (RSA is the most commonly used key negotiation algorithm in SSL), symmetric cryptography (all algorithms defined in the TLS RFCs can be accelerated with the PCH) and compression (DEFLATE algorithm). With the prominence of Web Proxy appliances in typical networks, this use case has applications from cloud computing to small web server deployments.

# 7 Supported APIs

The supported APIs are classified in two categories:

- Intel QuickAssist Technology APIs
- Additional APIs

Details on the APIs are included in the following sections.

# 7.1 Intel QuickAssist Technology APIs

The platforms described in this manual support the following Intel<sup>®</sup> QAT API libraries:

- **Cryptographic**: API definitions are located in: **\$ICP\_ROOT/quickassist/include/lac**, where **\$ICP\_ROOT** is the directory where the Acceleration software is unpacked. See the Intel QuickAssist Technology Cryptographic API Reference Manual for details.
- Data Compression: API definitions are located in: **\$ICP\_ROOT/quickassist/include/dc**. See the Intel QuickAssist Technology Data Compression API Reference Manual for details.

# 7.1.1 Cryptographic and Data Compression API Descriptions

Full descriptions of the Intel<sup>®</sup> QAT APIs are contained in the *Intel QuickAssist Technology Cryptographic* API Reference Manual and the *Intel QuickAssist Technology Data Compression API Reference Manual*.

In addition to the Intel<sup>®</sup> QAT Data Plane APIs, there are a number of Data Plane Polling APIs that are described in the *Polling Functions* section.

### 7.1.1.1 Data Plane APIs Overview

The Intel QuickAssist Technology Cryptographic API Reference Manual and the Intel QuickAssist Technology Data Compression API Reference Manual contain information on the APIs that are specific to data plane applications.

The APIs are recommended for applications that are executing in a data plane environment where the cost of offload (that is, the cycles consumed by the driver sending requests to the hardware) needs to be minimized. To minimize the cost of offload, several constraints have been placed on the APIs. If these constraints are too restrictive for your application, the traditional APIs can be used instead (at a cost of additional IA cycles).

The definition of the Cryptographic Data Plane APIs are contained in: **\$ICP\_ROOT/quickassist/** include/lac/cpa\_cy\_sym\_dp.h

The definition of the Data Compression Data Plane APIs are contained in: **\$ICP\_ROOT/quickassist/include/dc/cpa\_dc\_dp.h** 

### 7.1.1.2 IA Cycle Count Reduction When Using Data Plane APIs

From an IA cycle count perspective, the Data Plane APIs are more performant than the traditional APIs. The majority of the cycle count reduction is realized by the reduction of supported functionality in the Data Plane APIs and the application of constraints on the calling application.

In addition, to further improve performance, the Data Plane APIs attempt to amortize the cost of an MMIO access when sending requests to, and receiving responses from, the hardware.

A typical usage is to call the cpaCySymDpEnqueueOp() or the cpaDcDpEnqueueOp() function multiple times with requests to process and the performOpNow flag set to CPA\_FALSE. Once multiple requests have been enqueued, cpaCySymDpEnqueueOp() or cpaDcDpEnqueueOp() may be called with the performOpNow flag set to CPA\_TRUE. This sends the requests to the Intel<sup>®</sup> QAT Endpoint for processing.

The Intel<sup>®</sup> QAT API returns a CPA\_STATUS\_RETRY when the ring becomes full.

The number of requests to place on the ring is application dependent and it is recommended that performance testing be conducted with tunable parameter values.

Two functions, cpaCySymDpPerformOpNow() and cpaDCDpPerformOpNow(), are also provided that allow queued requests to be sent to the hardware without the need for queuing an additional request. This is typically used in the scenario where a request has not been received for some time and the application would like the enqueued requests to be sent to the hardware for processing.

# intel



### 7.1.1.3 Usage Constraints on the Data Plane APIs

The following constraints apply to the use of the Data Plane APIs. If the application can handle these constraints, the Data Plane APIs can be used:

- Thread safety is not supported. Each software thread should have access to its own unique instance (**CpaInstanceHandle**) to avoid contention on the hardware rings.
- For performance, polling is supported, as opposed to interrupts (which are comparatively more expensive).
- *Polling functions* are provided to read responses from the hardware response queue and dispatch callback functions.
- Buffers and buffer lists are passed using physical addresses to avoid virtual-to- physical address translation costs.
- Alignment restrictions are placed on the operation data (that is, the **CpaCySymDpOpData** structure) passed to the Data Plane API. The operation data must be at least 8-byte aligned, contiguous, resident, DMA-accessible memory.
- Only asynchronous invocation is supported, that is, synchronous invocation is not supported.
- There is no support for cryptographic partial packets. If support for partial packets is required, the traditional Intel<sup>®</sup> QAT APIs should be used.
- Since thread safety is not supported, statistic counters on the Data Plane APIs are not atomic.
- The default instance (CPA\_INSTANCE\_HANDLE\_SINGLE) is not supported by the Data Plane APIs. The specific handle should be obtained using the instance discovery func-

tions (cpaCyGetNumInstances(), cpaCyGetInstances(), cpaDcGetNumInstances(), cpaDcGetInstances()).

- The submitted requests are always placed on the high-priority ring.
- The data plane APIs are supported in both user space and polling mode in kernel space, but not supported in interrupt mode in kernel space.

# 7.1.2 Intel<sup>®</sup> QAT API Limitations

The following limitations apply when using the Intel® QAT APIs on the platforms described in this manual:

- For all services, the maximum size of a single perform request is 4 GB.
- For all services, data structures that contain data required by the Intel<sup>®</sup> QAT Endpoint should be on a 64-byte-aligned address to maximize performance. This alignment helps minimize latency when transferring data from DRAM to an Intel<sup>®</sup> QAT Endpoint integrated in the PCH device.
- For the key generation cryptographic API, the following limitations apply:

| Secure Sockets Layer (SSL) key genera-  | Maximum secret length is 512 bytes                 |
|-----------------------------------------|----------------------------------------------------|
| tion op-data:                           | Maximum userLabel length is 136 bytes              |
|                                         | Maximum generatedKeyLenInBytes is 248              |
| Transport Layer Security (TLS) key gen- | Secret length must be <128 bytes for TLS v1.0/1.1; |
| eration op-data:                        | Secret length must be <512 bytes for TLS v1.2      |
|                                         | Secret length must be <512 bytes for TLS v1.3      |
|                                         | userLabel length must be <256 bytes                |
|                                         | Maximum seed size is 64 bytes                      |
|                                         | Maximum generatedKeyLenInBytes is 248 bytes        |
| Mask Generation Function (MGF) op-      | Maximum seed length is 255 bytes                   |
| data:                                   | Maximum maskLenInBytes is 65528                    |

Table 43: Key Generation Cryptographic API Limitations

- For the cryptographic service, SNOW 3G and KASUMI\* operations are not supported when **Cpa-CySymPacketType** is set to **CPA\_CY\_SYM\_PACKET\_TYPE\_PARTIAL**. The error returned in this case is **CPA\_STATUS\_INVALID\_PARAM**.
- For the cryptographic service, when using the asymmetric crypto APIs, the buffer size passed to the API should be rounded to the next power of 2, or the next 3- times a power of 2, for optimum performance.
- For the data compression service, the size of all stateful decompression requests have to be a multiple of two with the exception of the last request.
- For the data compression service, the CpaDcFileType field in the CpaDcSessionSetup-Data data structure is ignored (previously this was considered for semi-dynamic compression/decompression).



• For static compression, the maximum expansion during compression is ceiling (9xTotal\_Input\_Byte/8)+7 bytes. If CPA\_DC\_ASB\_UNCOMP\_STATIC\_DYNAMIC\_WITH\_STORED\_HDRS or CPA\_DC\_ASB\_UNCOMP\_STATIC\_DYNAMIC\_WITH\_NO\_HDRS is selected, the maximum expansion during compression is the input buffer size plus up to ceiling (Total\_Input\_Byte/65535)x5 bytes, depending on whether the stored headers are selected.

**Note:** Due to the need for a skid pad and the way the checksum is calculated in the stored block case to prevent compression overflow, an output buffer size of ceiling (9\*To-tal\_Input\_Byte/8) + 55 bytes needs to be supplied (even though the stored block output size might be less).

The decompression service can report various error conditions, most of which arise from processing dynamic Huffman code trees that are ill-formed. These soft error conditions are reported at the Intel<sup>®</sup> QuickAssist Technology API using the CpaDcReqStatus enumeration. At the point of soft error, the hardware state will not be accurate to allow recovery. Therefore, in this case, the Intel<sup>®</sup> QuickAssist Technology software rolls back to the previous known good state and reports that no input has been processed and no output produced. This allows an application to correct the source of the error and resubmit the request.

For example, if the following source and destination buffers were submitted to the Intel<sup>®</sup> QuickAssist Technology:



Behavior when build flag ICP\_DC\_RETURN\_COUNTERS\_ON\_ERROR is defined. In some specialized applications, when a decompression soft error occurs, the application has no way of correcting the source of the error and resubmitting the request. The session will need to be invalidated and terminated. In this case it is more useful to the application to output the uncompressed data

up to the point of soft error before terminating the session. There is a compile time build flag (**ICP\_DC\_RETURN\_COUNTERS\_ON\_ERROR**) to select this mode of operation. This is the behavior of decompression in case of soft error when this build flag is used.

If the following source and destination buffers were submitted to the Intel<sup>®</sup> QuickAssist Technology API:



# 7.2 Additional APIs

**Note:** Not all *Additional APIs* are supported with all versions of the software package/hardware configuration.

There are a number of additional APIs that can serve for optimization and other uses outside of the Intel<sup>®</sup> QuickAssist Technology services.

# 7.2.1 Dynamic Instance Allocation Functions

These functions are intended for the dynamic allocation of instances in user space. The user can use these functions to allocate/free instances defined in the [DYN] section of the configuration file.

These functions are useful if the user needs to dynamically allocate/free cryptographic (CY) or Data Compression (DC) instances at runtime. This is in contrast to statically specifying the number of CY or DC instances at configuration time, where the number of instances cannot be changed unless the user modifies the **.conf** file and restarts the acceleration service.

The advantage of using these functions is that the number of CY/DC instances can be changed ondemand at runtime. The disadvantage is that runtime performance is impacted if the number of CY/DC instances is changed frequently.

If the user space application knows the number of instances to be used before starting, then the user can define Number<Service>Instances in the [User Process] section of the configuration file.

If the user space application can only know the number of instances at runtime, or wants to change the number at runtime, then the user can call the Dynamic Instance Allocation functions to allocate/free instances dynamically. The **Number<Service>Instances** in the **[DYN]** section of the .conf file(s) defines the maximum number of instances that can be allocated by user processes.

This can be useful when sharing instances among multiple applications at runtime. The maximum number of instances in a system is known in advance and it is possible to distribute them statically between applications using the configuration files. Once the driver is started, however, this cannot be changed. If, for example, there are 32 CY instances and we need to provision 16 processes, we can statically assign two CY instances per process. This can be a problem when a process needs more instances at any given time. With dynamic instance allocation, we can create a pool of instances that can be "shared" between the processes.

Continuing the example above with 32 CY instances and 16 processes, we can assign statically one CY instance to each process and create a pool of 16 **[DYN]** instances from the remainder. If at runtime one process needs more acceleration power, it can allocate some more instances from the pool, say, for example, eight, use them as appropriate and free them back to the pool when the work has been completed. Thereafter, other processes can use these instances as needed.

All dynamic instance allocation function definitions are located in: **\$ICP\_ROOT/quickassist/lookaside/** access\_layer/include/icp\_sal\_user.h

Important: Dynamic Instance Allocation Functions are not currently supported with the QAT2.0 driver.

## 7.2.1.1 icp\_sal\_userCyGetAvailableNumDynInstances

Get the number of cryptographic instances that can be dynamically allocated using the **icp\_sal\_userCyInstancesAlloc** function.

#### Syntax

CpaStatus icp\_sal\_userCyGetAvailableNumDynInstances (Cpa32U \*pNumCyInstances);

#### Parameters

| *pNumCyInstances | A pointer to the number of cryptographic instances available for dynamic al- |
|------------------|------------------------------------------------------------------------------|
|                  | location.                                                                    |

#### **Return Value**

The icp\_sal\_userCyGetAvailableNumDynInstances function returns one of the following codes:

| CPA_STATUS_SUCCESS | A pointer to the number of cryptographic instances available for dynamic al- |
|--------------------|------------------------------------------------------------------------------|
|                    | location.                                                                    |
| CPA_STATUS_FAIL    | Indicates a failure.                                                         |

## 7.2.1.2 icp\_sal\_userDcGetAvailableNumDynInstances

Get the number of data compression instances that can be dynamically allocated using the **icp\_sal\_userDcInstancesAlloc** function.

#### Syntax

CpaStatus icp\_sal\_userDcGetAvailableNumDynInstances (Cpa32U \*pNumDcInstances);

#### Parameters

| *pNumDcInstances | A pointer to the number of data compression instances available for dy- |
|------------------|-------------------------------------------------------------------------|
|                  | namic allocation.                                                       |

#### **Return Value**

The icp\_sal\_userDcGetAvailableNumDynInstances function returns one of the following codes:

| CPA_STATUS_SUCCESS | A pointer to the number of data compression instances available for dy- |
|--------------------|-------------------------------------------------------------------------|
|                    | namic allocation.                                                       |
| CPA_STATUS_FAIL    | Indicates a failure.                                                    |

# 7.2.1.3 icp\_sal\_userCyInstancesAlloc

Allocate the specified number of Cryptographic (CY) instances from the amount specified in the [DYN] section of the configuration file. The numCyInstances parameter specifies the number of CY instances to allocate and must be less than or equal to the value of the NumberCyInstances parameter in the [DYN] section of the configuration file.

#### Syntax

CpaStatus icp\_sal\_userCyInstancesAlloc(Cpa32U numCyInstances, CpaInstanceHandle
\*pCyInstances);

#### Parameters

| numCyInstances | The number of CY instances to allocate. |
|----------------|-----------------------------------------|
| *pCyInstances  | A pointer to the CY instances.          |

#### **Return Value**

The **icp\_sal\_userCyInstancesAlloc** function returns one of the following codes:

| CPA_STATUS_SUCCESS | Successfully allocated the specified number of CY instances. |
|--------------------|--------------------------------------------------------------|
| CPA_STATUS_FAIL    | Indicates a failure.                                         |

# 7.2.1.4 icp\_sal\_userDcInstancesAlloc

Allocate the specified number of Data Compression (DC) instances from the amount specified in the **[DYN]** section of the configuration file. The **numDcInstances** parameter specifies the number of dc instances to allocate and must be less than or equal to the value of the **NumberDcInstances** parameter in the **[DYN]** section of the configuration file.

#### Syntax

CpaStatus icp\_sal\_userDcInstancesAlloc(Cpa32U numDcInstances, CpaInstanceHandle \*pDcInstances);

#### Parameters

| numDcInstances | The number of DC instances to allocate. |
|----------------|-----------------------------------------|
| *pDcInstances  | A pointer to the DC instances.          |

#### **Return Value**

The icp\_sal\_userDcInstancesAlloc function returns one of the following codes:

| CPA_STATUS_SUCCESS | Successfully allocated the specified number of DC instances. |
|--------------------|--------------------------------------------------------------|
| CPA_STATUS_FAIL    | Indicates a failure.                                         |

## 7.2.1.5 icp\_sal\_userCyFreeInstances

Free the specified number of Cryptographic (CY) instances from the amount specified in the **[DYN]** section of the configuration file. The **numCyInstances** parameter specifies the number of CY instances to free.

#### Syntax

CpaStatus icp\_sal\_userCyFreeInstances(Cpa32U numCyInstances, CpaInstanceHandle \*pCyInstances);

#### Parameters

| numCyInstances | The number of CY instances to free. |
|----------------|-------------------------------------|
| *pCyInstances  | A pointer to the CY instances.      |

#### **Return Value**

The **icp\_sal\_userCyFreeInstances** function returns one of the following codes:

| CPA_STATUS_SUCCESS | Successfully freed the specified number of CY instances. |
|--------------------|----------------------------------------------------------|
| CPA_STATUS_FAIL    | Indicates a failure.                                     |

## 7.2.1.6 icp\_sal\_userDcFreeInstances

Free the specified number of Data Compression (DC) instances from the amount specified in the [DYN] section of the configuration file. The numDcInstances parameter specifies the number of DC instances to free.

#### Syntax

CpaStatus icp\_sal\_userDcFreeInstances(Cpa32U numDcInstances, CpaInstanceHandle \*pDcInstances);

#### Parameters

| numDcInstances | The number of DC instances to free.    |
|----------------|----------------------------------------|
| *pDcInstances  | A pointer to the DC instances to free. |

#### **Return Value**

The icp\_sal\_userDcFreeInstances function returns one of the following codes:

| CPA_STATUS_SUCCESS | Successfully freed the specified number of DC instances. |
|--------------------|----------------------------------------------------------|
| CPA_STATUS_FAIL    | Indicates a failure.                                     |



## 7.2.1.7 icp\_sal\_userCyGetAvailableNumDynInstancesByDevPkg

Get the number of cryptographic instances that can be dynamically allocated.

#### Syntax

CpaStatus icp\_sal\_userCyGetAvailableNumDynInstancesByDevPkg(Cpa32U \*pNumCyInstances, Cpa32U devPkgID);

#### Parameters

| pNumCyInstances | A pointer to the number of cryptographic instances available for dynamic al-  |
|-----------------|-------------------------------------------------------------------------------|
|                 | location.                                                                     |
| devPkgID        | The device ID of the device of interest (same as accelID in other APIs) If -1 |
|                 | then selects from all devices.                                                |

#### **Return Value**

The icp\_sal\_userCyGetAvailableNumDynInstancesByDevPkg function returns one of the following codes:

| CPA_STATUS_SUCCESS | Successfully retrieved the number of cryptographic instances available for |
|--------------------|----------------------------------------------------------------------------|
|                    | dynamic allocation.                                                        |
| CPA_STATUS_FAIL    | Indicates a failure.                                                       |

## 7.2.1.8 icp\_sal\_userDcGetAvailableNumDynInstancesByDevPkg

Get the number of data compression instances that can be dynamically allocated.

#### Syntax

CpaStatus icp\_sal\_userDcGetAvailableNumDynInstancesByDevPkg(Cpa32U \*pNumDcInstances, Cpa32U devPkgID);

#### Parameters

| *pNumDcInstances | A pointer to the number of data compression instances available for dy-<br>namic allocation.                 |
|------------------|--------------------------------------------------------------------------------------------------------------|
| devPkgID         | The device ID of the device of interest (same as accelID in other APIs) If -1 then selects from all devices. |

#### **Return Value**

The icp\_sal\_userDcGetAvailableNumDynInstancesByDevPkg function returns one of the following codes:

| CPA_STATUS_SUCCESS | Successfully freed the specified number of DC instances. |
|--------------------|----------------------------------------------------------|
| CPA_STATUS_FAIL    | Indicates a failure.                                     |

# 7.2.1.9 icp\_sal\_userCyInstancesAllocByDevPkg

Allocate the specified number of Cryptographic (CY) instances from the amount specified in the [DYN] section of the configuration file. The numCyInstances parameter specifies the number of CY instances to allocate and must be less than or equal to the value of the NumberCyInstances parameter in the [DYN] section of the configuration file.

#### Syntax

CpaStatus icp\_sal\_userCyInstancesAllocByDevPkg(Cpa32U numCyInstances, CpaInstanceHandle \*pCyInstances, devPkgID);

#### Parameters

| numCyInstances | The number of CY instances to allocate.                                       |
|----------------|-------------------------------------------------------------------------------|
| *pCyInstances  | A pointer to the CY instances.                                                |
| devPkgID       | The device ID of the device of interest (same as accelID in other APIs) If -1 |
|                | then selects from all devices.                                                |

#### **Return Value**

The icp\_sal\_userCyInstancesAllocByDevPkg function returns one of the following codes:

| CPA_STATUS_SUCCESS | Successfully allocated the specified number of CY instances. |
|--------------------|--------------------------------------------------------------|
| CPA_STATUS_FAIL    | Indicates a failure.                                         |

### 7.2.1.10 icp\_sal\_userDcInstancesAllocByDevPkg

Allocate the specified number of Data Compression (DC) instances from the amount specified in the **[DYN]** section of the configuration file. The **numDcInstances** parameter specifies the number of DC instances to allocate and must be less than or equal to the value of the **NumberDcInstances** parameter in the **[DYN]** section of the configuration file.

#### Syntax

CpaStatus icp\_sal\_userDcInstancesAllocByDevPkg(Cpa32U numDcInstances, CpaInstanceHandle \*pDcInstances, Cpa32U devPkgID;

#### Parameters

| numDcInstances | The number of DC instances to allocate.                                                                      |
|----------------|--------------------------------------------------------------------------------------------------------------|
| *pDcInstances  | A pointer to the DC instances.                                                                               |
| devPkgID       | The device ID of the device of interest (same as accelID in other APIs) If -1 then selects from all devices. |

#### **Return Value**

The icp\_sal\_userDcInstancesAllocByDevPkg function returns one of the following codes:

| CPA_STATUS_SUCCESS | Successfully allocated the specified number of DC instances. |
|--------------------|--------------------------------------------------------------|
| CPA_STATUS_FAIL    | Indicates a failure.                                         |

# 7.2.1.11 icp\_sal\_userCyGetAvailableNumDynInstancesByPkgAccel

Get the number of cryptographic instances that can be dynamically allocated.

#### Syntax

CpaStatus icp\_sal\_userCyGetAvailableNumDynInstancesByPkgAccel(Cpa32U \*pNumCyInstances, Cpa32U devPkgID, Cpa32U accelerator\_number);

#### Parameters

| pNumCyInstances    | A pointer to the number of cryptographic instances available for dynamic al-  |
|--------------------|-------------------------------------------------------------------------------|
|                    | location.                                                                     |
| devPkgID           | The device ID of the device of interest (same as accelID in other APIs) If -1 |
|                    | then selects from all devices.                                                |
| accelerator_number | Accelerator Engine to use. As 0 is the only valid value on C62x device, this  |
|                    | API is same as as icp_sal_userCyInstancesAllocByDevPkg                        |

#### **Return Value**

The icp\_sal\_userCyGetAvailableNumDynInstancesByPkgAccel function returns one of the following codes:

| CPA_STATUS_SUCCESS | Successfully retrieved the number of cryptographic instances available for |
|--------------------|----------------------------------------------------------------------------|
|                    | dynamic allocation.                                                        |
| CPA_STATUS_FAIL    | Indicates a failure.                                                       |

## 7.2.1.12 icp\_sal\_userCyInstancesAllocByPkgAccel

Allocates the specified number of Cryptographic (CY) instances from the amount specified in the **[DYN]** section of the configuration file. The **numCyInstances** parameter specifies the number of CY instances to allocate and must be less than or equal to the value of the **NumberCyInstances** parameter returned by a call to the **icp\_sal\_userCyInstancesAllocByPkgAccel** function.

#### Syntax

CpaStatus icp\_sal\_userCyInstancesAllocByPkgAccel(Cpa32U numCyInstances, CpaInstanceHandle \*pCyInstances, Cpa32U devPkgID, Cpa32U accelerator\_number);

#### Parameters

| numCyInstances     | The number of CY instances to allocate.                                       |
|--------------------|-------------------------------------------------------------------------------|
| *pCyInstances      | A pointer to the CY instances.                                                |
| devPkgID           | The device ID of the device of interest (same as accelID in other APIs) If -1 |
|                    | then selects from all devices                                                 |
| accelerator_number | Accelerator Engine to use. As 0 is the only valid value on C62x device, this  |
|                    | API is same as as icp_sal_userCyInstancesAllocByDevPkg                        |

#### **Return Value**

The icp\_sal\_userCyInstancesAllocByPkgAccel function returns one of the following codes:

| CPA_STATUS_SUCCESS | Successfully allocated the specified number of CY instances. |
|--------------------|--------------------------------------------------------------|
| CPA_STATUS_FAIL    | Indicates a failure.                                         |

# 7.2.2 IOMMU Remapping Functions

These functions are intended for IOMMU remapping operations.

All IOMMU remapping function definitions are located in: **\$ICP\_ROOT/quickassist/lookaside/** access\_layer/include/icp\_sal\_iommu.h

### 7.2.2.1 icp\_sal\_iommu\_get\_remap\_size

Returns the page\_size rounded for IOMMU remapping.

#### Syntax

size\_t icp\_sal\_iommu\_get\_remap\_size(size\_t size);

#### Parameters

| size | The minimum required page size. |
|------|---------------------------------|
|------|---------------------------------|

#### **Return Value**

The icp\_sal\_iommu\_get\_remap\_size function returns the page\_size rounded for IOMMU remapping.

## 7.2.2.2 icp\_sal\_iommu\_map

Adds an entry to the IOMMU remapping table.

#### Syntax

CpaStatus icp\_sal\_iommu\_map(Cpa64U phaddr, Cpa64U iova, size\_t size);

#### Parameters

| phaddr | Host physical address.       |
|--------|------------------------------|
| iova   | Guest physical address.      |
| size   | Size of the remapped region. |

#### **Return Value**

The icp\_sal\_iommu\_map function returns one of the following codes:

| CPA_STATUS_SUCCESS | Successful operation. |
|--------------------|-----------------------|
| CPA_STATUS_FAIL    | Indicates a failure.  |

# 7.2.2.3 icp\_sal\_iommu\_unmap

Removes an entry from the IOMMU remapping table.

#### Syntax

CpaStatus icp\_sal\_iommu\_unmap(Cpa64U iova, size\_t size);

#### Parameters

| iova | Guest physical address.      |
|------|------------------------------|
| size | Size of the remapped region. |

#### **Return Value**

The icp\_sal\_iommu\_unmap function returns one of the following codes:

| CPA_STATUS_SUCCESS | Successful operation. |
|--------------------|-----------------------|
| CPA_STATUS_FAIL    | Indicates a failure.  |

# 7.2.2.4 IOMMU Remapping Function Usage

These functions are required when the user wants to access an acceleration service from the Physical Function (PF) when SR-IOV is enabled in the driver. In this case, all I/O transactions from the device go through DMA remapping hardware.

This hardware checks:

- 1. If the transaction is legitimate
- 2. What physical address the given I/O address needs to be translated to.

If the I/O address is not in the transaction table, it fails with a DMA Read error shown as follows:

- DRHD: Handling fault status reg 3.
- **DMAR**:[DMA Read] Request device [02:01.2] fault addr <ADDR> DMAR:[fault reason 06] PTE Read access is not set.

To make this work, the user must add a 1:1 mapping as follows:

1. Get the size required for a buffer.

```
int size = icp_sal_iommu_get_remap_size(size_of_data);
```

2. Allocate a buffer.

```
char *buff = malloc(size);
```

3. Get a physical pointer to the buffer.

```
buff_phys_addr = virt_to_phys(buff);
```

4. Add a 1:1 mapping to the IOMMU tables.

icp\_sal\_iommu\_map(buff\_phys\_addr, buff\_phys\_addr, size;

- 5. **Use** the buffer to send data to the Intel<sup>®</sup> QAT Endpoint.
- 6. Before freeing the buffer, **remove** the IOMMU table entry.

```
icp_sal_iommu_unmap(buff_phys_addr, size);
```

7. **Free** the buffer.

free(buff);

The IOMMU remapping functions can be used in all contexts that the Intel<sup>®</sup> QAT APIs can be used, that is, kernel and user space in a Physical Function (PF) Domain 0, as well as kernel and user space in a Virtual Machine (VM).

In the case of VM, the APIs will do nothing. In the PF Domain O case, the APIs will update the hardware IOMMU tables.

# 7.2.3 Polling Functions

These functions are intended for retrieving response messages that are on the rings and dispatching the associated callbacks.

All polling function definitions are located in: **\$ICP\_ROOT/quickassist/lookaside/access\_layer/** include/icp\_sal\_poll.h

# 7.2.3.1 icp\_sal\_pollBank

Poll all rings on the given Intel<sup>®</sup> QAT Endpoint on a given bank number to determine if any of the rings contain response messages from the Intel<sup>®</sup> QAT Endpoint. The **response\_quota** input parameter is per ring.

#### Syntax

CpaStatus icp\_sal\_pollBank(Cpa32U accelId, Cpa32U bank\_number, Cpa32U response\_quota);

#### Parameters

| accelId        | The device number associated with the Intel® QAT Endpoint. valid range is |
|----------------|---------------------------------------------------------------------------|
|                | O to number of Intel $^{\circ}$ QAT Endpoints in the system.              |
| bank_number    | The number of the memory bank on the Intel® QAT Endpoint that will be     |
|                | polled for response messages. The valid range is 0 to 31.                 |
| response_quota | The maximum number of responses to take from the ring in one call.        |

#### **Return Value**

The icp\_sal\_pollBank function returns one of the following codes:

| CPA_STATUS_SUCCESS | Successfully polled a ring with data.                                           |
|--------------------|---------------------------------------------------------------------------------|
| CPA_STATUS_RETRY   | There is no data on any ring on any bank or the banks are already being polled. |
| CPA_STATUS_FAIL    | Indicates a failure.                                                            |

# 7.2.3.2 icp\_sal\_pollAllBanks

Poll all banks on the given Intel<sup>®</sup> QAT Endpoint to determine if any of the rings contain response messages from the Intel<sup>®</sup> QAT Endpoint. The **response\_quota** input parameter is per ring.

#### Syntax

CpaStatus icp\_sal\_pollAllBanks(Cpa32U accelId, Cpa32U response\_quota);

#### Parameters

| accelId        | The device number associated with the Intel® QAT Endpoint. valid range is |
|----------------|---------------------------------------------------------------------------|
|                | O to number of Intel $^{\circ}$ QAT Endpoints in the system.              |
| response_quota | The maximum number of responses to take from the ring in one call.        |

#### **Programmer's Guide**

#### **Return Value**

The icp\_sal\_pollAllBanks function returns one of the following codes:

| CPA_STATUS_SUCCESS | Successfully polled a ring with data.                                           |
|--------------------|---------------------------------------------------------------------------------|
| CPA_STATUS_RETRY   | There is no data on any ring on any bank or the banks are already being polled. |
| CPA_STATUS_FAIL    | Indicates a failure.                                                            |

## 7.2.3.3 icp\_sal\_CyPollInstance

Poll the Cryptographic (CY) logical instance associated with the **instanceHandle** to retrieve requests that are on response rings associated with that instance and dispatch the associated callbacks. The **re-sponse\_quota** input parameter is the maximum number of responses to process in one call.

**Note:** The icp\_sal\_CyPollInstance() function is used in conjunction with the CyXIsPolled parameter in the acceleration configuration file.

#### Syntax

CpaStatus icp\_sal\_CyPollInstance(CpaInstanceHandle instanceHandle, Cpa32U response\_quota);

#### Parameters

| instanceHandle | The logical instance to poll for responses on the response ring.        |
|----------------|-------------------------------------------------------------------------|
| response_quota | The maximum number of responses to take from the ring in one call. When |
|                | set to 0, all responses are retrieved.                                  |

#### **Return Value**

The **icp\_sal\_CyPollInstance** function returns one of the following codes:

| CPA_STATUS_SUCCESS | The function was successful.                                                  |
|--------------------|-------------------------------------------------------------------------------|
| CPA_STATUS_RETRY   | There are no responses on the rings associated with the specified logical in- |
|                    | stance.                                                                       |
| CPA_STATUS_FAIL    | Indicates a failure.                                                          |

**Note:** A ring is only polled if it contains data.

# 7.2.3.4 icp\_sal\_DcPollInstance

Poll the Data Compression (DC) logical instance associated with the **instanceHandle** to retrieve requests that are on response rings associated with that instance and dispatch the associated callbacks. The **response\_quota** input parameter is the maximum number of responses to process in one call.

**Note:** The icp\_sal\_DcPollInstance() function is used in conjunction with the DcXIsPolled parameter in the acceleration configuration file.

#### Syntax

CpaStatus icp\_sal\_DcPollInstance(CpaInstanceHandle instanceHandle, Cpa32U response\_quota);

#### Parameters

| instanceHandle | The logical instance to poll for responses on the response ring.        |
|----------------|-------------------------------------------------------------------------|
| response_quota | The maximum number of responses to take from the ring in one call. When |
|                | set to 0, all responses are retrieved.                                  |

#### **Return Value**

The **icp\_sal\_DcPollInstance** function returns one of the following codes:

| CPA_STATUS_SUCCESS | The function was successful.                                                  |  |
|--------------------|-------------------------------------------------------------------------------|--|
| CPA_STATUS_RETRY   | There are no responses on the rings associated with the specified logical in- |  |
|                    | stance.                                                                       |  |
| CPA_STATUS_FAIL    | Indicates a failure.                                                          |  |

**Note:** A ring is only polled if it contains data.

### 7.2.3.5 icp\_sal\_CyPolIDpInstance

Poll a particular Cryptographic (CY) data path logical instance associated with the **instanceHandle** to retrieve requests that are on the high-priority symmetric ring associated with that instance and dispatch the associated callbacks. The **response\_quota** input parameter is the maximum number of responses to process in one call.

**Note:** This function is a Data Plane API function and consequently the restrictions in *Usage Constraints on the Data Plane APIs* apply.

#### Syntax



CpaStatus icp\_sal\_CyPollDpInstance(CpaInstanceHandle instanceHandle, Cpa32U response\_quota);

#### Parameters

| instanceHandle | The logical instance to poll for responses on the response ring.        |  |
|----------------|-------------------------------------------------------------------------|--|
| response_quota | The maximum number of responses to take from the ring in one call. When |  |
|                | set to 0, all responses are retrieved.                                  |  |

#### **Return Value**

The **icp\_sal\_CyPollDpInstance** function returns one of the following codes:

| CPA_STATUS_SUCCESS | The function was successful.                                                  |  |
|--------------------|-------------------------------------------------------------------------------|--|
| CPA_STATUS_RETRY   | There are no responses on the rings associated with the specified logical in- |  |
|                    | stance.                                                                       |  |
| CPA_STATUS_FAIL    | Indicates a failure.                                                          |  |

### 7.2.3.6 icp\_sal\_DcPolIDpInstance

Poll a particular Data Compression (DC) data path logical instance associated with the **instanceHandle** to retrieve requests that are on the response ring associated with that instance. The **response\_quota** input parameter is the maximum number of responses to process in one call.

**Note:** This function is a Data Plane API function and consequently the restrictions in *Usage Constraints* on the Data Plane APIs apply.

#### Syntax

CpaStatus icp\_sal\_DcPollDpInstance(CpaInstanceHandle instanceHandle, Cpa32U response\_quota);

#### Parameters

| instanceHandle | The logical instance to poll for responses on the response ring.        |  |
|----------------|-------------------------------------------------------------------------|--|
| response_quota | The maximum number of responses to take from the ring in one call. When |  |
|                | set to 0, all responses are retrieved.                                  |  |

#### **Return Value**

The icp\_sal\_DcPollDpInstance function returns one of the following codes:

| CPA_STATUS_SUCCESS | The function was successful.                                                  |  |
|--------------------|-------------------------------------------------------------------------------|--|
| CPA_STATUS_RETRY   | There are no responses on the rings associated with the specified logical in- |  |
|                    | stance.                                                                       |  |
| CPA_STATUS_FAIL    | Indicates a failure.                                                          |  |

# 7.2.4 User Space Access Configuration Functions

Functions that allow the configuration of user space access to the Intel<sup>®</sup> QAT services from processes running in user space.

All user space access configuration function definitions are located in **\$ICP\_ROOT/quickassist/**lookaside/access\_layer/include/icp\_sal\_user.h

# 7.2.4.1 icp\_sal\_userStart

Initializes user space access to an Intel<sup>®</sup> QAT Endpoint and starts the **ProcessName** section in the given section of the configuration file. This function needs to be called prior to any call to Intel<sup>®</sup> QAT API function from the user space process. This function is typically called only once in a user space process.

**Note:** The icp\_sal\_userStartMultiProcess() function is still supported, but the parameter limitDevAccess is ignored because its value is set once in the configuration file and is not allowed to be specified again in the function.

The configuration format allows the user to easily create a configuration for many user spaces processes. The driver internally generates unique process names and a valid configuration for each process based on the section name (psectionName) and mode (limitDevAccess) provided.

For example, on a system with *M* number of devices, if all *M* configuration files contain:

```
[IPSec]
NumProcesses = N
LimitDevAccess = 0
```

Then *N* internal sections are generated (each with instances on all devices) and *N* processes can be started at any given time. Each process can call **icp\_sal\_userStart("IPSec")** and the driver determines the unique name to use for each process.

Similarly, on an *M* device system, if all *M* configuration files contain:

| SSL]              |  |
|-------------------|--|
| lumProcesses = N  |  |
| imitDevAccess = 1 |  |
|                   |  |

Then *MxN* internal sections are generated (each with instances on one device only) and *MxN* processes can be started at any given time. Each process can call **icp\_sal\_userStart("SSL")** and the driver determines the unique name to use for each process.

Refer to *Configuring Multiple Processes on a System with Multiple Intel® QAT Endpoints* for detailed example.

#### Syntax

CpaStatus icp\_sal\_userStart(const char \*pSectionName);

#### Parameters

| pSectionName | The section name described in the configuration file. |
|--------------|-------------------------------------------------------|
|--------------|-------------------------------------------------------|

#### **Return Value**

The **icp\_sal\_userStart** function returns one of the following codes:

| CPA_STATUS_SUCCESS | Successfully started user space access to the Intel® QAT Endpoint as de- |
|--------------------|--------------------------------------------------------------------------|
|                    | fined in the configuration file.                                         |
| CPA_STATUS_FAIL    | Operation failed.                                                        |

## 7.2.4.2 icp\_sal\_userStop

Closes user space access to the Intel<sup>®</sup> QAT Endpoint; stops the services that were running and frees the allocated resources. After a successful call to this function, user space access to the Intel<sup>®</sup> QAT Endpoint from a calling process is not possible. This function should be called once when the process is finished using the Intel<sup>®</sup> QAT Endpoint and does not intend to use it again.

#### Syntax

CpaStatus icp\_sal\_userStop( void);

#### Parameters

None

#### **Return Value**

The icp\_sal\_userStop function returns one of the following codes:

| CPA_STATUS_SUCCESS | Successfully stopped user space access to the Intel® QAT Endpoint. |
|--------------------|--------------------------------------------------------------------|
| CPA_STATUS_FAIL    | Operation failed.                                                  |

# 7.2.5 Version Information Function

A function that allows the retrieval of version information related to the software and hardware being used.

The version information function definition is located in: **\$ICP\_ROOT/quickassist/lookaside/** access\_layer/include/icp\_sal\_versions.h.

# 7.2.5.1 icp\_sal\_getDevVersionInfo

Retrieves the hardware revision and information on the version of the software components being run on a given device.

**Note:** The **icp\_sal\_userStart** function must be called before calling this function. If not, calling this function returns **CPA\_STATUS\_INVALID\_PARAM** indicating an error. The **icp\_sal\_userStart** function is responsible for setting up the ADF user space component, which is required for this function to operate successfully.

#### Syntax

| CpaStatus              | icp_sal_getDevVersionInfo(Cpa32U | devīd, | <pre>icp_sal_dev_version_info_t</pre> |
|------------------------|----------------------------------|--------|---------------------------------------|
| <pre>*pVerInfo);</pre> |                                  |        |                                       |

#### Parameters

| devId     | The ID (number) of the device for which version information is to be re- |
|-----------|--------------------------------------------------------------------------|
|           | trieved.                                                                 |
| *pVerInfo | A pointer to a structure that holds the version information.             |

#### **Return Value**

The icp\_sal\_getDevVersionInfo function returns one of the following codes:

| CPA_STATUS_SUCCESS       | Operation finished successfully; version information retrieved. |
|--------------------------|-----------------------------------------------------------------|
| CPA_STATUS_INVALID_PARAM | Invalid parameter passed to the function.                       |
| CPA_STATUS_RESOURCE      | System resource problem.                                        |
| CPA_STATUS_FAIL          | Operation failed.                                               |

# 7.2.6 Reset Device Function

This API can only be called in user-space.

The device can be reset using this API call. This will schedule a reset of the device. The device can also be reset using the adf\_ctl utility, e.g., by calling adf\_ctl qat\_dev0 reset.

## 7.2.6.1 icp\_sal\_reset\_device

Resets the device.

#### Syntax

CpaStatus icp\_sal\_reset\_device(Cpa32U accelId);

#### Parameters

| accelId | The device number. |
|---------|--------------------|

#### **Return Value**

The icp\_sal\_reset\_device function returns one of the following codes:

| CPA_STATUS_SUCCESS | Successful operation. |
|--------------------|-----------------------|
| CPA_STATUS_FAIL    | Indicates a failure.  |

# 7.2.7 Thread-Less APIs

These APIs can be used when the QAT acceleration driver has been configured not to spawn threads.

These APIs can be used in the user space application.

## 7.2.7.1 icp\_sal\_poll\_device\_events

This reads any pending device events from icp\_dev%d\_csr and forwards to interested subsystems.

Syntax

CpaStatus icp\_sal\_poll\_device\_events(void);

#### Parameters

None

#### **Return Value**

The icp\_sal\_poll\_device\_events function returns one of the following codes:

| CPA_STATUS_SUCCESS | Successful operation. |
|--------------------|-----------------------|
| CPA_STATUS_FAIL    | Indicates a failure.  |



# 7.2.7.2 icp\_sal\_find\_new\_devices

This tries to connect to any available devices that the kernel driver has brought up and initialized for use in user space process.

#### Syntax

CpaStatus icp\_sal\_find\_new\_devices(void);

#### Parameters

None

#### **Return Value**

The icp\_sal\_find\_new\_devices function returns one of the following codes:

| CPA_STATUS_SUCCESS | Successful operation. |
|--------------------|-----------------------|
| CPA_STATUS_FAIL    | Indicates a failure.  |

# 7.2.8 Compress and Verify (CnV) Related APIs

APIs documented in this section are used for Compress and Verify. These APIs can be used in the user space application.

# 7.2.8.1 icp\_sal\_get\_dc\_error

This API allows the application to return the number of errors that occurred a particular number of times during the lifetime of a process.

#### Syntax

Cpa64U icp\_sal\_get\_dc\_error(Cpa8S dcError);

#### Parameters

| dcError Compressio | n Error code exposed by CpaDcReqStatus enum in cpa_dc.h |
|--------------------|---------------------------------------------------------|
|--------------------|---------------------------------------------------------|

#### **Return Value**

The icp\_sal\_get\_dc\_error function returns a 64 bit unsigned integer representing how many times the error type specified by Cpa8S dcError occurred in the current process.

### 7.2.8.2 icp\_sal\_dc\_simulate\_error

This API injects a simulated compression error for a defined number of compression or decompression requests. The simulated compression errors can only be applied to the traditional APIs. It must be called prior the APIs that perform the request. In the case of a simulated Compress and Verify error for a single request, the application would call icp\_sal\_dc\_simulate\_error() API as such: icp\_sal\_dc\_simulate\_error(1, CPA\_DC\_VERIFY\_ERROR);

#### Syntax

CpaStatus icp\_sal\_dc\_simulate\_error(Cpa8U numErrors, Cpa8S dcError);

#### Parameters

| numErrors | Number of simulated compression or decompression errors desired.      |
|-----------|-----------------------------------------------------------------------|
| dcError   | Desired error code to be returned by the compression or decompression |
|           | API.                                                                  |

#### **Return Value**

The icp\_sal\_dc\_simulate\_error function returns one of the following codes:

| CPA_STATUS_SUCCESS | Successful operation.                                                   |
|--------------------|-------------------------------------------------------------------------|
| CPA_STATUS_FAIL    | Indicates that an invalid error type was assigned to dcError parameter. |

# 7.2.9 Heartbeat APIs

These APIs check firmware/hardware status for a given device and are used as part of the Heartbeat functionality.

### 7.2.9.1 icp\_sal\_check\_device

This function checks the status of the firmware/hardware for a given device and is used as part of the Heartbeat functionality.

#### Syntax

CpaStatus icp\_sal\_check\_device(Cpa32U accelID);

#### Parameters

| accelID | The device ID. |  |
|---------|----------------|--|
|---------|----------------|--|

#### **Return Value**

The icp\_sal\_check\_device function returns one of the following codes:

| CPA_STATUS_SUCCESS | Successful operation. |
|--------------------|-----------------------|
| CPA_STATUS_FAIL    | Indicates a failure.  |

## 7.2.9.2 icp\_sal\_check\_all\_devices

This function checks the status of the firmware/hardware for all devices and is used as part of the Heartbeat functionality.

Syntax

CpaStatus icp\_sal\_check\_all\_devices(void);

#### Parameters

None

#### **Return Value**

The icp\_sal\_check\_all\_devices function returns one of the following codes:

| CPA_STATUS_SUCCESS | Successful operation. |
|--------------------|-----------------------|
| CPA_STATUS_FAIL    | Indicates a failure.  |

## 7.2.9.3 icp\_sal\_heartbeat\_simulate\_failure

This function simulates heartbeat failure for a specific device.

#### Syntax

CpaStatus icp\_sal\_heartbeat\_simulate\_failure(Cpa32U accelID);

#### Parameters

| accelID | The device ID. |
|---------|----------------|
|         |                |

#### **Return Value**

The icp\_sal\_heartbeat\_simulate\_failure function returns one of the following codes:

| CPA_STATUS_SUCCESS | Successful operation. |
|--------------------|-----------------------|
| CPA_STATUS_FAIL    | Indicates a failure.  |

# 7.2.10 Device Polling APIs

APIs documented in this section are used for polling devices.

# 7.2.10.1 icp\_sal\_poll\_device\_events

This function polls for device reset events.

#### Syntax

CpaStatus icp\_sal\_poll\_device\_events(void);

#### Parameters

None

#### **Return Value**

The icp\_sal\_poll\_device\_events function returns one of the following codes:

| CPA_STATUS_SUCCESS | Successful operation. |
|--------------------|-----------------------|
| CPA_STATUS_FAIL    | Indicates a failure.  |

**Note:** The events are sent to each instance that has registered a callback function. The callbacks are registered using **cpaCyInstanceSetNotificationCb** and **cpaDcInstanceSetNotificationCb**.

## 7.2.10.2 cpaCyInstanceSetNotificationCb

Cryptographic instances use this function to register for device event notifications.

#### Syntax

CpaStatus cpaCyInstanceSetNotificationCb(const CpaInstanceHandle instanceHandle, const CpaCyInstanceNotificationCbFunc pinstanceNotificationCb, void \*pCallbackTag);

#### Parameters

| instanceHandle          | Instance handle.                                 |
|-------------------------|--------------------------------------------------|
| pinstanceNotificationCb | Instance notification callback function pointer. |
| *pCallbackTag           | Opaque value provided by user.                   |

#### **Return Value**

The cpaCyInstanceSetNotificationCb function returns one of the following codes:

| CPA_STATUS_SUCCESS       | The function was successful. |
|--------------------------|------------------------------|
| CPA_STATUS_FAIL          | Indicates a failure.         |
| CPA_STATUS_INVALID_PARAM | Invalid parameter passed in. |
| CPA_STATUS_UNSUPPORTED   | Function is not supported.   |

The signature for the callback function is:

```
typedef void (*CpaCyInstanceNotificationCbFunc)(
    const CpaInstanceHandle instanceHandle,
    void * pCallbackTag,
    const CpaInstanceEvent instanceEvent);
```

Parameter:

```
typedef enum _CpaInstanceEvent
{
    CPA_INSTANCE_EVENT_RESTARTING = 0,
    CPA_INSTANCE_EVENT_RESTARTED,
    CPA_INSTANCE_EVENT_FATAL_ERROR
} CpaInstanceEvent;
```

# 7.2.10.3 cpaDcInstanceSetNotificationCb

Data compression instances use this function to register for device event notifications.

#### Syntax

CpaStatus cpaDcInstanceSetNotificationCb(const CpaInstanceHandle instanceHandle, const CpaDcInstanceNotificationCbFunc pinstanceNotificationCb, void \*pCallbackTag);

#### Parameters

| instanceHandle          | Instance handle.                                 |
|-------------------------|--------------------------------------------------|
| pinstanceNotificationCb | Instance notification callback function pointer. |
| *pCallbackTag           | Opaque value provided by user.                   |

#### **Return Value**

The **cpaDcInstanceSetNotificationCb** function returns one of the following codes:

| CPA_STATUS_SUCCESS       | The function was successful. |
|--------------------------|------------------------------|
| CPA_STATUS_FAIL          | Indicates a failure.         |
| CPA_STATUS_INVALID_PARAM | Invalid parameter passed in. |
| CPA_STATUS_UNSUPPORTED   | Function is not supported.   |

The signature for the callback function is:

```
intel.
```

```
typedef void (*CpaDcInstanceNotificationCbFunc)(
    const CpaInstanceHandle instanceHandle,
    void * pCallbackTag,
    const CpaInstanceEvent instanceEvent);
```

Parameter:

```
typedef enum _CpaInstanceEvent
{
    CPA_INSTANCE_EVENT_RESTARTING = 0,
    CPA_INSTANCE_EVENT_RESTARTED,
    CPA_INSTANCE_EVENT_FATAL_ERROR
} CpaInstanceEvent;
```

# 7.2.11 Congestion Management APIs

Congestion Management or Back-pressure mechanism APIs are intended to handle the cases when the device is busy. These APIs ensures there is enough space on the ring before submitting a request.

Applications can query the appropriate ring on each instance and select any instance with enough space without creating any **OpData** structures.

All these API definitions are located in: **\$ICP\_ROOT/quickassist/lookaside/access\_layer/include/** icp\_sal\_congestion\_mgmt.h.

Important: Congestion Management APIs are not currently supported with the QAT2.0 driver.

## 7.2.11.1 icp\_sal\_SymGetInflightRequests

This function is used to fetch in-flight and max in-flight request counts for the given symmetric instance handle.

#### Syntax

CpaStatus icp\_sal\_SymGetInflightRequests(CpaInstanceHandle instanceHandle, Cpa32U \*maxInflightRequests, Cpa32U \*numInflightRequests);

#### Parameters

| instanceHandle  | Symmetric instance handle.                        |
|-----------------|---------------------------------------------------|
| *maxInflightRe- | A pointer to the max in-flight request count.     |
| quests          |                                                   |
| *numInflightRe- | A pointer to the current in-flight request count. |
| quests          |                                                   |

#### **Return Value**

The icp\_sal\_SymGetInflightRequests function returns one of the following codes:

| CPA_STATUS_SUCCESS       | Successfully retrieved the request counts. |
|--------------------------|--------------------------------------------|
| CPA_STATUS_INVALID_PARAM | Invalid parameter passed to the function.  |
| CPA_STATUS_FAIL          | Indicates a failure.                       |

# 7.2.11.2 icp\_sal\_AsymGetInflightRequests

This function is used to fetch in-flight and max in-flight request counts for the given asymmetric instance handle.

#### Syntax

CpaStatus icp\_sal\_AsymGetInflightRequests(CpaInstanceHandle instanceHandle, Cpa32U \*maxInflightRequests, Cpa32U \*numInflightRequests);

#### Parameters

| instanceHandle  | Asymmetric instance handle.                       |
|-----------------|---------------------------------------------------|
| *maxInflightRe- | A pointer to the max in-flight request count.     |
| quests          |                                                   |
| *numInflightRe- | A pointer to the current in-flight request count. |
| quests          |                                                   |

#### **Return Value**

The icp\_sal\_AsymGetInflightRequests function returns one of the following codes:

| CPA_STATUS_SUCCESS       | Successfully retrieved the request counts. |
|--------------------------|--------------------------------------------|
| CPA_STATUS_INVALID_PARAM | Invalid parameter passed to the function.  |
| CPA_STATUS_FAIL          | Indicates a failure.                       |

# 7.2.11.3 icp\_sal\_dp\_SymGetInflightRequests

This data plane function is used to fetch in-flight and max in-flight request counts for the given symmetric instance handle.

#### Syntax

CpaStatus icp\_sal\_dp\_SymGetInflightRequests(CpaInstanceHandle instanceHandle, Cpa32U \*maxInflightRequests, Cpa32U \*numInflightRequests);

#### Parameters



| instanceHandle  | Symmetric instance handle.                        |
|-----------------|---------------------------------------------------|
| *maxInflightRe- | A pointer to the max in-flight request count.     |
| quests          |                                                   |
| *numInflightRe- | A pointer to the current in-flight request count. |
| quests          |                                                   |

#### **Return Value**

The icp\_sal\_dp\_SymGetInflightRequests function returns one of the following codes:

| CPA_STATUS_SUCCESS       | Successfully retrieved the request counts. |
|--------------------------|--------------------------------------------|
| CPA_STATUS_INVALID_PARAM | Invalid parameter passed to the function.  |
| CPA_STATUS_FAIL          | Indicates a failure.                       |

# 7.2.12 Service Specific Polling APIs

These service specific polling APIs are intended for retrieving response messages that are on the specific ring and dispatching the associated callback.

All these API definitions are located in: **\$ICP\_ROOT/quickassist/lookaside/access\_layer/include/** icp\_sal\_poll.h

### 7.2.12.1 icp\_sal\_CyPollSymRing

Poll the symmetric logical instance associated with the **instanceHandle** to retrieve requests that are on the response rings associated with that instance and dispatch the associated callbacks. The **re-sponse\_quota** input parameter is the maximum number of responses to process in one call.

#### Syntax

```
CpaStatus icp_sal_CyPollSymRing(CpaInstanceHandle instanceHandle, Cpa32U response_quota);
```

#### Parameters

| instanceHandle | Instance handle to poll for responses on the response ring.                  |
|----------------|------------------------------------------------------------------------------|
| response_quota | The maximum number of messages that will be read in one polling. Setting     |
|                | the response quota to zero means that all messages on the ring will be read. |

#### **Return Value**

The icp\_sal\_CyPollSymRing function returns one of the following codes:

| CPA_STATUS_SUCCESS       | Successfully polled a ring with data.                             |  |
|--------------------------|-------------------------------------------------------------------|--|
| CPA_STATUS_INVALID_PARAM | Invalid parameter passed to the function.                         |  |
| CPA_STATUS_RETRY         | There are no responses on the rings associated with the instance. |  |
| CPA_STATUS_FAIL          | Indicates a failure.                                              |  |
| CPA_STATUS_RESTARTING    | Device restarting. Resubmit the request.                          |  |

# 7.2.12.2 icp\_sal\_CyPollAsymRing

Poll the asymmetric logical instance associated with the **instanceHandle** to retrieve requests that are on the response rings associated with that instance and dispatch the associated callbacks. The **re-sponse\_quota** input parameter is the maximum number of responses to process in one call.

#### Syntax

CpaStatus icp\_sal\_CyPollAsymRing(CpaInstanceHandle instanceHandle, Cpa32U response\_quota);

#### Parameters

| instanceHandle | Instance handle to poll for responses on the response ring.                  |  |
|----------------|------------------------------------------------------------------------------|--|
| response_quota | The maximum number of messages that will be read in one polling. Setting     |  |
|                | the response quota to zero means that all messages on the ring will be read. |  |

#### **Return Value**

The icp\_sal\_CyPollAsymRing function returns one of the following codes:

| CPA_STATUS_SUCCESS       | Successfully polled a ring with data.                             |  |
|--------------------------|-------------------------------------------------------------------|--|
| CPA_STATUS_INVALID_PARAM | Invalid parameter passed to the function.                         |  |
| CPA_STATUS_RETRY         | There are no responses on the rings associated with the instance. |  |
| CPA_STATUS_FAIL          | Indicates a failure.                                              |  |
| CPA_STATUS_RESTARTING    | Device restarting. Resubmit the request.                          |  |

# 7.2.13 Check Device Availability APIs

### 7.2.13.1 icp\_sal\_userlsQatAvailable

This API allows an application to establish if there is any active QAT device present on system, without calling internal libadf APIs or without a dependency on **icp\_sal\_userStart()** 

#### Syntax

CpaBoolean icp\_sal\_userIsQatAvailable(void);

#### Parameters

None

#### **Return Value**

The icp\_sal\_userIsQatAvailable function returns one of the following codes:

| CPA_TRUE  | Indicates that there is at least one active device. |
|-----------|-----------------------------------------------------|
| CPA_FALSE | Indicates that there are no active devices.         |

# 8 Virtualization

# 8.1 Virtualization Deployment Model for Intel<sup>®</sup> QAT 2.0



Two different methods of virtualization are supported as shown in the below image.

Physical Device direct assignment

Single Root IOV (SR-IOV)

# 8.2 Physical Device Direct Assignment

The hardware exposes one Physical Function (PF) per QAT Endpoint to the host. Number of QAT Endpoints per platform is included in the *Dimensions*.

One or more PFs may be passed to a single virtual machine.

There is no sharing of the PF.

**Note:** Hot plugging of Physical Functions (PFs) is not supported. To ensure the device functions correctly after being added, please restart the virtual machine.

# 8.3 Single Root IOV (SR-IOV)

When SR-IOV is enabled, the hardware exposes one Physical Function (PF) and **n** Virtual Functions(VFs) per QAT Endpoint to the host, where **n** is defined in *Dimensions*. Number of QAT Endpoints per platform is also included in the *Dimensions*.

One or more VFs can be passed through to different guests/VMs

For details on enabling SR-IOV refer to the Virtualization Deployment Guide.

# 8.4 Reducing Number of VFs per Endpoint

Note: Reducing number of VFs per endpoint is supported starting with QAT Gen 4.

When the acceleration software is installed for SR-IOV use case, all VFs are enabled. In some instances, it is not desirable to enable all VFs.

The following commands can be used to limit VFs exposed per PF.

1. **Disable** all VFs on a specific device (76:00.0 in this example):

echo 0 | sudo tee /sys/bus/pci/devices/0000:76:00.0/sriov\_numvfs

2. Enable number of desired VFs for specific device (4 VFs on 76:00.0 in this example):

echo 4 | sudo tee /sys/bus/pci/devices/0000:76:00.0/sriov\_numvfs

#### Important:

• This change is not persistent. After a reboot, all VFs are exposed per PF.

• Restarting the QAT service will also result in all VFs being exposed per PF.

After reducing the number of VFs per PF, it is possible that the mapping of QAT VF to configuration file has changed. This mapping is very important especially when there are different services enabled with each PF/VF configuration, remembering that to enable a service in a VF requires the same service to be enabled in the PF. The configuration file for the VF can be determined by examining the qat\_service output.

For example, in the following output:

| Checking status of all devices.                                                            |
|--------------------------------------------------------------------------------------------|
| There is 100 QAT acceleration device(s) in the system:                                     |
| <pre>qat_dev0 - type: 4xxx, inst_id: 0, node_id: 0, bsf: 0000:6b:00.0, #accel: 1</pre>     |
| ⇔#engines: 9 state: up                                                                     |
| <pre>qat_dev1 - type: 4xxx, inst_id: 1, node_id: 0, bsf: 0000:70:00.0, #accel: 1</pre>     |
| →#engines: 9 state: up                                                                     |
| <pre>qat_dev2 - type: 4xxx, inst_id: 2, node_id: 0, bsf: 0000:75:00.0, #accel: 1</pre>     |
| →#engines: 9 state: up                                                                     |
| <pre>qat_dev3 - type: 4xxx, inst_id: 3, node_id: 0, bsf: 0000:7a:00.0, #accel: 1</pre>     |
| →#engines: 9 state: up                                                                     |
| <pre>qat_dev4 - type: 4xxx, inst_id: 4, node_id: 1, bsf: 0000:e8:00.0, #accel: 1</pre>     |
| →#engines: 9 state: up                                                                     |
| <pre>qat_dev5 - type: 4xxx, inst_id: 5, node_id: 1, bsf: 0000:ed:00.0, #accel: 1</pre>     |
| →#engines: 9 state: up                                                                     |
| <pre>qat_dev6 - type: 4xxx, inst_id: 6, node_id: 1, bsf: 0000:f2:00.0, #accel: 1</pre>     |
| →#engines: 9 state: up                                                                     |
| <pre>qat_dev7 - type: 4xxx, inst_id: 7, node_id: 1, bsf: 0000:f7:00.0, #accel: 1</pre>     |
| →#engines: 9 state: up                                                                     |
| <pre>qat_dev8 - type: 4xxxvf, inst_id: 80, node_id: 0, bsf: 0000:6b:00.1, #accel: 1</pre>  |
| →#engines: 1 state: up                                                                     |
| <pre>qat_dev9 - type: 4xxxvf, inst_id: 81, node_id: 0, bsf: 0000:6b:00.2, #accel: 1</pre>  |
| →#engines: 1 state: up                                                                     |
| <pre>qat_dev10 - type: 4xxxvf, inst_id: 82, node_id: 0, bsf: 0000:6b:00.3, #accel: 1</pre> |
| →#engines: 1 state: up                                                                     |
| <pre>qat_dev11 - type: 4xxxvf, inst_id: 83, node_id: 0, bsf: 0000:6b:00.4, #accel: 1</pre> |
| →#engines: 1 state: up                                                                     |
| <pre>qat_dev12 - type: 4xxxvf, inst_id: 84, node_id: 0, bsf: 0000:70:00.1, #accel: 1</pre> |
| →#engines: 1 state: up                                                                     |
| <pre>qat_dev13 - type: 4xxxvf, inst_id: 85, node_id: 0, bsf: 0000:70:00.2, #accel: 1</pre> |
| →#engines: 1 state: up                                                                     |
| <pre>qat_dev14 - type: 4xxxvf, inst_id: 86, node_id: 0, bsf: 0000:70:00.3, #accel: 1</pre> |
| →#engines: 1 state: up                                                                     |
| <pre>qat_dev15 - type: 4xxxvf, inst_id: 87, node_id: 0, bsf: 0000:70:00.4, #accel: 1</pre> |
| →#engines: 1 state: up                                                                     |
| <pre>qat_dev16 - type: 4xxxvf, inst_id: 88, node_id: 0, bsf: 0000:75:00.1, #accel: 1</pre> |
| →#engines: 1 state: up                                                                     |
| <pre>qat_dev17 - type: 4xxxvf, inst_id: 89, node_id: 0, bsf: 0000:75:00.2, #accel: 1</pre> |
| →#engines: 1 state: up                                                                     |
| <pre>qat_dev18 - type: 4xxxvf, inst_id: 90, node_id: 0, bsf: 0000:75:00.3, #accel: 1</pre> |
| →#engines: 1 state: up                                                                     |
| <pre>qat_dev19 - type: 4xxxvf, inst_id: 91, node_id: 0, bsf: 0000:75:00.4, #accel: 1</pre> |
| →#engines: 1 state: up                                                                     |
|                                                                                            |

The configuration file name will be /etc/4xxxvf\_dev<x>.conf where x is inst\_id.

For *qat\_dev9*, the configuration file is **/etc/4xxxvf\_dev81.conf** 

# **9** Secure Architecture Considerations

This section describes the potential threats identified as part of the secure architecture analysis of the Intel<sup>®</sup> Quick Assist Technology acceleration complex and the actions that can be taken to protect against these threats. This chapter concentrates on the acceleration complex. First, the terminology covering the main threat categories and mechanisms, attacker privilege and deployment models are presented. Then, some common mitigation actions that can be applied to many of these threat categories and mechanisms are discussed. Finally, more specific threat/attack vectors, including attacks against specific services of the PCH device are described.

# 9.1 Terminology

Each of the potential threat/attack vectors discussed may be described in terms of the following:

- Threat Categories
- Attack Mechanism
- Attacker Privilege
- Deployment Models

# 9.1.1 Threat Categories

System threats can be classified into the categories in the following table.

| Category             | Nature of Threat and Examples                                                     |
|----------------------|-----------------------------------------------------------------------------------|
| Exposure of Data     | Attacker reads data to which they should not have read access.                    |
|                      | Attacker reads cryptographic keys.                                                |
| Modification of Data | Attacker overwrites data to which they should not have write access.              |
|                      | Attacker overwrites cryptographic keys.                                           |
| Denial of Service    | Attacker causes application or driver software (running on an IA core)            |
|                      | to fail or terminate.                                                             |
|                      | Attacker causes Intel <sup>®</sup> QuickAssist Accelerator firmware to hang, tem- |
|                      | porarily impeding service.                                                        |

#### Table 44: Threat Categories

continues on next page

| Table 44 – continued from previous page |                                                                                                                                                                                                                          |
|-----------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Category Nature of Threat and Examples  |                                                                                                                                                                                                                          |
|                                         | Attacker causes excessive use of resource (IA core, Intel® QuickAssist<br>Accelerator firmware thread, silicon slice, PCIe* bandwidth, and so on),<br>thereby reducing availability of the service to legitimate client. |

# 9.1.2 Attack Mechanism

Attack Mechanisms and Examples

Some of the mechanisms by which an attacker can carry out an attack are listed in the following table.

| Mechanism                   | Examples                                                                          |  |
|-----------------------------|-----------------------------------------------------------------------------------|--|
| Contrived Packet Stream     | Attacker crafts a packet stream that exploits known vulnerabilities in            |  |
|                             | the software, firmware, or hardware. This could include vulnerabilities           |  |
|                             | such as buffer overflow bugs, lack of parameter validation, and so on.            |  |
| Compromised Application     | Attacker modifies the application code calling the Intel <sup>®</sup> QuickAssist |  |
| Software                    | Technology API to exploit known vulnerabilities in the driver/hardware.           |  |
| Application Malware         | In an environment where an attacker may be able to run their own ap-              |  |
|                             | plication, separate from the main application software, they may invoke           |  |
|                             | the Intel® QuickAssist Technology API to exploit known vulnerabilities            |  |
|                             | in the driver/hardware.                                                           |  |
| Compromised IA driver soft- | Attacker modifies the IA driver to exploit known vulnerabilities in the           |  |
| ware                        | driver/hardware.                                                                  |  |
| Defect                      | It is also possible that the attack is not malicious, but rather an uninten-      |  |
|                             | tional defect.                                                                    |  |

#### Table 45: Attack Mechanism

# 9.1.3 Attacker Privilege

The following table describes the privileges that an attacker may have. The table describes the case of a non-virtualized system.

| Privilege                    | Comments                                                                                                                                                                                                              |
|------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Physical access              | There is no attempt to protect against threats, such as signal probes,<br>where the attacker has physical access to the system. Customers can<br>protect their systems using physical locks, tamper-proof enclosures, |
|                              | Faraday cages, and so on.                                                                                                                                                                                             |
| Logged in as privileged user | There is no attempt to protect against threats where the attacker is logged in as a privileged user. Customers can protect their systems using strong, frequently changed passwords, and so on.                       |

#### Table 46: Attacker Privilege

continues on next page

| Table 46 – continued from previous page |                                                                                                                                                                                                                               |  |
|-----------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|
| Privilege                               | Comments                                                                                                                                                                                                                      |  |
| Logged in as unprivileged<br>user       | If the attacker is logged into a platform as an unprivileged user, it is im-<br>portant to ensure that they cannot use the services of the PCH to ac-<br>cess (read or write) any data to which they would not otherwise have |  |
|                                         | access.                                                                                                                                                                                                                       |  |
| Ability to send packets                 | In almost all deployments, attackers have the ability to send arbitrary packets from the network into the system. It is assumed that threats (for example, denial of service attacks) may arrive in this way.                 |  |

# 9.1.4 Deployment Models

Some of the possible deployment models are given in the following table.

| Deployment Model                        | Examples                                                                   |
|-----------------------------------------|----------------------------------------------------------------------------|
| System with no untrusted users          | <ul><li>Network security appliance</li><li>Server in data center</li></ul> |
| System with potentially untrusted users | Server in data center                                                      |

#### Table 47: Deployment Models

# 9.2 Threat/Attack Vectors

A thorough analysis has been conducted by considering each of the threat categories, attack mechanisms, attacker privilege levels, and deployment models. As a result, the following threats have been identified. Also described are the steps a user of the PCH chipset can take to mitigate against each threat. Some general practices that mitigate many of the common threats are considered first. Thereafter, threats on specific services and mitigation against those threats are described.

# 9.2.1 General Mitigation

The following mitigation techniques are generic to different threats and attack vectors:

- Ensure that all software running on the platform that has access to Intel<sup>®</sup> Quick Assist Technology devices is within the trust boundary of the platform owner. This mitigation includes software running in virtual machines and containers.
- Intel<sup>®</sup> follows Secure Coding guidelines, including performing code reviews and running static analysis on its driver software and firmware, to ensure its compliance with security guidelines. It is

recommended that customers follow similar guidelines when developing application code. This should include the use of tools such as static analysis, fuzzing, and so on.

- Ensure each hardware component, including the PCH chipset, processor, and DRAM, is physically secured from attackers. This can include such examples as physical locks, tamper proofing, and Faraday cages (to prevent side-channel attacks via electromagnetic radiation).
- Ensure that network services not required on the module are not operating and that the corresponding network ports are locked down.
- Use strong passwords to protect against dictionary and other attacks on administrative and other login accounts.

# 9.2.2 General Threats

General threats include the following:

- DMA
- Intentional Modification of IA Driver
- Modification of the QAT Configuration File
- Malicious Application Code
- Denial of Service

## 9.2.2.1 DMA

**Threat:** The PCH can perform Direct Memory Access (DMA, the copying of data) between defined memory locations. Once an attacker has sufficient privilege to invoke the Intel<sup>®</sup> QuickAssist Technology API, or to write to/read from the hardware rings used by the driver to communicate with the device, they can send requests to the Intel<sup>®</sup> QuickAssist Accelerator to perform such DMA, passing arbitrary physical memory addresses as the source and/or destination addresses, thereby exposing or modifying regions of memory to which they would otherwise not have access.

**Mitigation 1:** Ensure that Intel<sup>®</sup> Input-Output Memory Management Unit (IOMMU) is enabled. This will force USDM to create QuickAssist IOMMU domain and all memory allocated by USDM will be mapped into this domain, hence malicious user or error in user application cannot read or write memory outside this domain which mitigates the risk. However because there is only single domain, there is no protection between individual Virtual Functions(VFs) or applications. This design is done for simplicity of memory manager and if needed, VFIO-PCI should be used to create individual domains per VF.

**Mitigation 2:** Ensure that only trusted users are granted permissions to access the Intel<sup>®</sup> QuickAssist Technology API, or to write to and read from the hardware rings. Specifically, the PCH configuration file describes logical instances of acceleration services and the set of hardware rings to be used for each such instance. User processes can ask the kernel driver to map these rings into their address spaces. To access a given device (identified by the number in the filenames below), the user must be granted read/write access to the following files, which may be in /dev:

- uio<0...N> (where <0...N> are the qat uio device numbers)
- qat\*
- usdm\_drv

# 9.2.2.2 Intentional Modification of IA Driver

**Threat**: An attacker can potentially modify the IA driver to behave maliciously. This may lead to a denial of service of Intel<sup>®</sup> Quick Assist Technology services.

**Mitigation**: The driver object/executable file on disk should be protected using the normal file protection mechanisms so that it is writable only by trusted users, for example, a privileged user or an administrator. Specifically, the Intel<sup>®</sup> QuickAssist Technology kernel objects and libraries should not be writeable by user. If the *qat* user group is being used to provide access to Intel<sup>®</sup> Quick Assist Technology services, then this group should not have write permission to the binaries.

# 9.2.2.3 Modification of the QAT Configuration File

**Threat**: The QAT configuration file is read at initialization time by the driver and specifies what instances of each service (cryptographic, data compression) should be created, and which rings each service instance will use. Modifying this file could lead to denial of service by deleting required instances or could be used to attempt to create additional instances that the attacker could subsequently attempt to access for malicious purposes.

**Mitigation**: The configuration file should be protected using the normal file protection mechanisms so that it is writable only by trusted users, for example, a privileged user or an administrator.

**Note:** By default, the configuration file is stored in the /etc directory and may be named something like, c6xxx\_dev0.conf. Its default permissions are that it is readable and writeable only by *root* user and *qat* group.

## 9.2.2.4 Malicious Application Code

**Threat**: An attacker who can gain access to the Intel<sup>®</sup> QuickAssist Technology API may be able to exploit the following features of the API:

- Buffers passed to the API have a specified length of up to 32 bits. By specifying excessive lengths, an attacker may be able to cause denial of service by overwriting data beyond the end of a buffer.
- Buffer lists passed to the API consist of a scatter gather list (array of buffers). An attacker may incorrectly specify the number of buffers, causing denial of service due to the reading or writing of incorrect buffers.

**Mitigation**: Platform management can include the Rate Limiting feature to mitigate against Noisy Neighbors. Only trusted users and applications should be allowed to access the Intel<sup>®</sup> QuickAssist Technology API, as described in General Mitigations.

## 9.2.2.5 Denial of Service

**Threat**: An attacker may construct a service request that does not conform to the specification, resulting in low of service due to service timeouts, halting of Quick Assist service or undesired platform level conditions.

**Mitigation**: The current generation of Intel<sup>®</sup> Quick Assist Technology has been designed for performance, providing direct access to hardware via PCIe\* MMIO space. Misuse of hardware registers is to be avoided, and the threat against intentional misuse must be mitigated by ensuring all software on the platform is trusted.

An attacker may attempt to contrive a packet stream that monopolizes the acceleration services, thereby denying service to legitimate users. This may consist of one or more of the following:

- Sending packets that are compressed (for example, using IPComp) or encrypted (for example, using IPsec), thereby reducing the availability of these services to legitimate traffic.
- Sending excessively large packets, causing some latency for legitimate packets.
- Sending small packets at a high packet rate, causing extra bandwidth utilization on the PCI Express\* bus connecting the device to the processor.

**Mitigation**: Proper monitoring of Device Usage (DU) and the construction of Service Level Agreements (SLA) are now available as part of the Rate Limiting feature.

# 9.2.3 Threats Specific to Cryptographic Service

Threats against the cryptographic service include:

## 9.2.3.1 Reading Cryptographic Keys

**Threat**: Cryptographic keys are stored in DRAM. An attacker who can determine where these are stored could read the DRAM to get access to the keys or could write the DRAM to use keys known by the attacker, thereby compromising the confidentiality of data protected by these keys. Some cryptographic keys have long lives. The impact of an attacker obtaining the key may exist for the lifetime of the key itself.

**Mitigation**: DRAM is considered inside the cryptographic boundary (as defined by FIPS 140-2). The normal memory protection schemes provided by the Intel<sup>®</sup> architecture processor and memory controller, and by the operating system, prevent unauthorized access to these memory regions.

# **10 Revision History**

| Document | Description                                            | Date          |
|----------|--------------------------------------------------------|---------------|
| Version  |                                                        |               |
| 007      | Updated SLA Units for PKE                              | December 2024 |
| 006      | Updated Integrity Checksum details                     | November 2024 |
| 005      | Removed S-IOV References                               | April 2024    |
| 004      | Updates for 1.1.40 Release                             | March 2024    |
| 003      | RSA-1024 added as Opt-in.                              | June 2023     |
| 002      | Note added about using SR-IOV and S-IOV simultaneously | May 2023      |
|          | on same PF (not supported).                            |               |
| 001      | Initial Release                                        | February 2023 |