intel 张新峰 intel Media Stack


2019/07/08 发布于 编程 分类


3. agenda • GPU overview • GPU media • Intel media stack • MSDK vs VAAPI • Customization
4. GPU overview The Display interface and Blitter (block image transferrer) are controlled primarily by direct CPU register addresses the 3D and Media pipelines and the parallel Video Codec Engine (VCE) are controlled primarily through instruction lists in memory
5. Execution Units (EUs) subsystem The subsystem contains an array of cores, or execution units, with a number of “shared functions”, which receive and process messages at the request of programs running on the cores. The shared functions perform critical tasks, such as sampling textures and updating the render target (usually the frame buffer). • Generally programmable – OpenCL kernels run here • In order, SIMD • 128 x 8 x 32-bit registers per thread • Up to 7 threads per EU • Zero cycle thread switching • 8, 16, or 32 OpenCL work items per thread Subslice • OpenCL workgroups assigned by subslice • Multiple EUs • Sampler (images), data port (buffers)
6. GPU configuration
7. GPU configuration • Unslice – Fixed function pipelines for 3D, GPGPU, and Media operations, and interface to the outside world. • The 3D Geometry / Fixed Function (Geom/FF) block consisting of:'>of: 1. 3D fixed function pipeline (CS, VFVS, HS, TE, DS, GS, SOL, SL, SFE, SVG) 2. Video Front-End unit (VFE) 3. Thread Spawner unit (TSG) and the global Thread Dispatcher unit (TDG) 4. Unified Return Buffer Manager (URBM) • Media fixed function assets: 1. Video Decode (VD) Box 2. Video Enhancement (VE) Box 3. Scaler & Format Converter (SFC) • The Global Assets (GA) block as the primary interface and memory stream gateway to the outside world, consisting of:'>of: 1. GT Interface (GTI) 2. State Variable Manager (SVM) 3. Blitter (BLT) 4. Graphics Arbiter (GAM)
8. GPU configuration • Subslice (three shown) – A compute unit with supporting fixed- or shared-function assets sufficient for the EU capability. o A bank of Execution Units (EUs) – eight per subslice shown o Sampler, supporting both media and 3D functions o Gateway (GWY) o Instruction cache (IC) o Local Thread Dispatcher (TDL) o Barycentric Calculator (BC) o Pixel Shader Dispatcher (PSD) o Data Cluster (HDC) o Dataport Render Cache (DAPRC) - two per subslice • Slice Common – Scalable fixed function assets which support the compute horsepower provided two or more subslices. o 3D Fixed Function: § Windower/Mask unit (WM) § Hi-Z (HZ) and Intermediate Z (IZ) § Setup Backend (SBE) § RCPFE, BE § 3D stream caches (RCC, MSC, STC, RCZ) o Media Fixed Functions: § DAPRSC § SVL § TDC o L3 Cache – backing L3 cache for certain memory streams emanating from subslices. § L3 Data cache with support for data, URB, and shared local memory (SLM)
10. SFC-scale and format converter. Power saving. EU-less usage: SFC is a fixed function engine architects to run concurrently along VDBOX or VEBOX. i.e. Decode and scaling will be happening at the same time, or Image enhancement and scaling will be occurring at the same time. It saves power by offloading the scaling workload off the media render engine to this dedicated engine which is much smaller.
11. Intel media(ICL.feature is different with platforms)
13. CPU GPU APP slice slice MSDK commands UMD OS scheduler data EU EU EU EU EU sampler IEF AVS VME 3D KMD DMA Ring buffer MI_BATCH_BUFFER_START com1 command streamer VDBOX aka MFX decode aka MFD com2 VDENC com3 VEBOX Batch buffer DN DI data HuC encode aka MFC aka PAK ProcAmp TCC SFC scalar and format conversion pipe resize IEF CSC data com10 com11 WDBOX data com12 ME PAK mux MPEG2 TS
14. libva Application CMRT Middle ware libva Customized call ENCODE Decode VP Display Display backend glx/x11/wayland/drm… VA backend iHD driver glx/x11/wayland/drm…
15. Media driver , MediaSDK Driver:media-driver is an open source hardware accelerated video driver which supports Intel® HD Graphics starting from Broadwel Source code development libraries and SDK to access Hardware capabilities:LibVA: Libva is an implementation for VA-API (Video Acceleration API) - an open-source library which provides access to graphics hardware acceleration capabilities. LibVA-utils is a collection of utilities and examples to exercise VA-API in accordance with the libva project. Media SDK Intel® Media SDK Intel® Media SDK provides a plain C API to access hardware-accelerated video decode, encode and filtering on Intel® Gen graphics hardware platforms. Implementation written in C++ 11 with parts in C-for-Media (CM). Supported video encoders: HEVC, AVC, MPEG-2, JPEG, VP9 Supported video decoders: HEVC, AVC, VP8, VP9, MPEG-2, VC1, JPEG Supported video pre-processing filters: Color Conversion, Deinterlace, Denoise, Resize, Rotate, Composition, HDR 360 SDK, provide a 360 stitching with 2/6 cameras inputs. in open source plan. Capture SDK, provide a game capture solution for game streaming, in open source plan. 14
16. Source code Component Location ® ® ® ® ® ® ® ® ® ® ® ® Intel Graphics Media Driver Intel Graphics Media SDK Intel Graphics HDCP Intel Graphics Libva Intel Libva Sample Code Intel C-for-Media Compiler Intel Graphics Memory Management Library Intel Graphics Hardware Composer for Android* OS Intel Graphics 3D Graphics Library Intel Compute Library for DNN Intel Graphics Compute Runtime for OpenCL™ Driver Intel Graphics Compiler for OpenCL
17. MSDK Basic decode flow Expected Return Codes for DecodeFrameAsync Initialize MFX_ERR_MORE_SURFACE Drain loop Main loop DecodeFrameAsync DecodeFrameAsync (bitstream in) (null in) •A new surface is required to proceed – this is where decode will write its output MFX_ERR_MORE_DATA •More input bitstream data is required to proceed MFX_WRN_DEVICE_BUSY •HW device is unable to respond. This is an expected output for normal operation and should clear after a very short wait. However, if this state persists more than a few milliseconds this may indicate a problem. MFX_ERR_MORE_SURFACE MFX_ERR_MORE_SURFACE MFX_WRN_VIDEO_PARAM_CHANGED •the SDK decoder parsed a new sequence header. Decoding can continue with existing frame buffers. The application can optionally retrieve new video parameters by calling MFXVideoDECODE_GetVideoParam. Other Input finished More input MFX_ERR_MORE_DATA Finish (MFX_ERR_MORE_DATA indicates all surfaces drained) •Other error codes may be bugs. Please contact an Intel support representative for more info. 16
18. • Calling flow: VAAPI for decode vaInitialize() vaDestroyContext vaDestroyConfig vaCreateConfig vaCreateSurface VAPictureParameterBuffer VAIQMatrixBufferType vaDestroySurface vaCreateContext vaCreateBuffer VASlcieParameterBufferType VASliceDataBufferType vaBeginPicture vaDestroyBuffer vaRenderPicture PictureParameterBuffer IQMatrixBufferType vaSyncSurface vaEndPicture SlcieParameterBufferType SliceDataBufferType 17
19. Open source Release workflow 18.4.0 18.3.0 18.4.pre8 intel-media-18.3 release branch 18.4.pre7 19.1.0 19.1.pre7 intel-media-18.4 release branch 19.1.1 intel-media-19.1 release branch Update in 19.1.1 19.1.pre6 19.1.pre5 2019year-> 18.3.pre1 18.3.pre2 4 18. .pre1 …. 18.4.pre6 intel-media-18.2.pre1 intel-media-18.2.pre1 intel-media-18.2.pre1 intel-media-18.2.0 19.1.pre1 …. 19.1.pre4 2 19. .pre1 19.2.pre2 Year Quarter Bi-weekly pre-release Release version, +1 for update intel-mediasdk-*: same scheme 18
20. Gen Decoder Implementation • FF • AVC/MPEG2/JPEG/VP8 GPU MFX (FF) HuC SFC EUs • HuC+FF • HEVC (header parsing in HuC) • HWDRM (header decryption in HuC) • FF+EU • VC-1(decoding + OLP) • AVC (field) downsampling • FF+SFC • AVC (frame) downsampling 19
21. Gen Encoder Implementation • FF • JPEG Encoder on Gen7+ GPU HuC MFX (FF) VDENC EUs VME • EU+VME+FF • AVC/MPEG-2 encoder on Gen7+ • HEVC encoder on Gen9 + • HuC+VDENC • AVC VDENC encoder on Gen9+ • HEVC/VP9 VDENC encoder on Gen11+ • EU+HuC+VDENC • HEVC 8b/10b on HSW/BDW/SKL 20
22. typedef struct _VAEncMiscParameterRateControl { uint32_t bits_per_second; uint32_t target_percentage; uint32_t window_size; uint32_t initial_qp; uint32_t min_qp; uint32_t basic_unit_size; union { struct { uint32_t reset : 1; uint32_t disable_frame_skip : 1; uint32_t disable_bit_stuffing : 1; uint32_t mb_rate_control : 4; uint32_t temporal_id : 8; uint32_t cfs_I_frames : 1; uint32_t enable_parallel_brc : 1; uint32_t enable_dynamic_scaling : 1; uint32_t frame_tolerance_mode : 2; uint32_t reserved : 12; } bits; uint32_t value; } rc_flags; uint32_t ICQ_quality_factor; uint32_t max_qp; uint32_t quality_factor; uint32_t va_reserved[VA_PADDING_MEDIUM - 3]; } VAEncMiscParameterRateControl;
23. • • • • • • • • • quality performance Compression rate customization Customer could use these features for special usagecases.