Linux 6.17 Released
The well oiled wheel of the Linux release cycle has just produced another kernel: Linus Torvalds announced the release of v6.17, which contains a number of contributions from Igalia. This cycle, which started immediately after the release of kernel v6.16 on the July 27 and ended on September 28 (63 days), includes 13089 new commits:
$ git log --oneline --no-merges v6.16..v6.17 | wc -l
13089
And, as usual, the list of new features and changes is huge and scattered all over the code. Here are some subjective highlights which may be worth looking into:
- Attack vector controls reorganisation to make it easier to control which CPU mitigations (x86 Spectre) are applied.
- Scheduler support for uniprocessor configurations has been dropped.
- There are two new system calls,
file_getattr
andfile_setattr
, used to set/get extended file attributes. - The usual set of new Rust additions.
For the details on what has changed with this Linux 6.17 release, the best starting point is the awesome KernelNewbies ChangeLog.
As for the regular Igalia contributions, the full log is listed below, after a quick summary of the main changes.
GPU drivers in the Linux Kernel
In the DRM (Direct Rendering Manager) subsystem, we worked on creating and implementing the new dma-fence safe access rules and APIs which enable drivers to export fences shared via mechanisms, such as the Android sync_file framework, in a safe manner. We then converted the Intel xe driver to comply with the new rules, which fixed an existing use-after-free condition.
On the TTM (Translation Table Manager) front, we made the shrinker more responsive by making it respect the contract expected by the kernel’s memory management layer, and by also shrinking the TTM pools more effectively.
In the DRM scheduler space, we continued with the code base clean-ups both via contributing patches and by providing reviews.
Intel Xe driver
Going back to the Intel’s xe driver, and on the road of improving support for older GPUs such as Alderlake and Meteorlake, we landed some refactoring work which will in the future enable adding some missing hardware workarounds and also adding support for scanning out compressed surfaces.
GPU resets
Continuing our GPU reset efforts, we added new fields for the wedged event API: the PID and name of the task involved in the reset. This allows user-space tools, like compositors, to display to the user some information about what just happened. For instance, a message box that saying “<Game name>
caused a GPU error and was terminated”. User-space may also implement strict policies if a given task is causing resets too often, like preventing it from starting for a duration of time. Prior to this work, there was no way for user-space tools to know what task was involved in the GPU reset.
In addition to providing more information about what triggered a GPU reset, we also worked to prevent GPU resets in cases of false-positive GPU hangs due to job timeout detection. We started investigating this issue after noticing BOs (buffer objects) leaks in the Raspberry Pi 4. We noticed that the DRM scheduler could inform a job timeout and trigger a GPU reset when the GPU was not hung and the timed-out job was still running. To address this, we developed mechanisms to have the timeout callback inform the DRM scheduler via return code about a false-positive GPU hang, i.e., we added a new DRM scheduling status that allows a driver to ignore the reset. This new status indicates that the job should be reinserted into the pending list, after which the driver will still signal its completion and free its resources.
Another round of bug fixes
And, as usual, there was another round of bug fixes and cleanups in the DRM drivers. Some of these involved color management in the AMD graphics driver. Others aimed to simplify the code around mutexes and remove compilation warnings. Finally, we also made minor improvements to the MSM driver and the VKMS KUnit tests.
Igalia Changelog
Authored (48)
André Almeida
- drm: drm_auth: Convert mutex usage to guard(mutex)
- drm: amdgpu: Allow NULL pointers at amdgpu_vm_put_task_info()
- drm: amdgpu: Create amdgpu_vm_print_task_info()
- drm: Create a task info option for wedge events
- drm/doc: Add a section about “Task information” for the wedge API
- drm: amdgpu: Use struct drm_wedge_task_info inside of struct amdgpu_task_info
- drm/amdgpu: Make use of drm_wedge_task_info
- drm/amd: Do not include <linux/export.h> when unused
- drm/amd: Include <linux/export.h> when needed
- drm/doc: Fix title underline for “Task information”
- drm: Add missing struct drm_wedge_task_info kernel doc
- drm/doc: Fix grammar for “Task information”
- drm/amdgpu: Fix lifetime of struct amdgpu_task_info after ring reset
Maíra Canal
- drm/vkms: Compile all tests with CONFIG_DRM_VKMS_KUNIT_TEST
- drm/sched: Rename DRM_GPU_SCHED_STAT_NOMINAL to DRM_GPU_SCHED_STAT_RESET
- drm/sched: Allow drivers to skip the reset and keep on running
- drm/sched: Make timeout KUnit tests faster
- drm/sched: Add new test for DRM_GPU_SCHED_STAT_NO_HANG
- drm/v3d: Use DRM_GPU_SCHED_STAT_NO_HANG to skip the reset
- drm/etnaviv: Use DRM_GPU_SCHED_STAT_NO_HANG to skip the reset
- drm/xe: Use DRM_GPU_SCHED_STAT_NO_HANG to skip the reset
- drm/panfrost: Use DRM_GPU_SCHED_STAT_NO_HANG to skip the reset
- drm/msm: Update global fault counter when faulty process has already ended
Melissa Wen
Thadeu Lima de Souza Cascardo
Tvrtko Ursulin
- drm/i915: Use provided dma_fence_is_chain
- dma-fence: Change signature of __dma_fence_is_later
- drm/ttm: Respect the shrinker core free target
- drm/ttm: Increase pool shrinker batch target
- dma-fence: Use a flag for 64-bit seqnos
- dma-fence: Add helpers for accessing driver and timeline name
- sync_file: Use dma-fence driver and timeline name helpers
- drm/i915: Use dma-fence driver and timeline name helpers
- sync_file: Protect access to driver and timeline name
- drm/i915: Protect access to driver and timeline name
- dma-fence: Add safe access helpers and document the rules
- drm/xe: Make dma-fences compliant with the safe access rules
- drm/xe: Consolidate LRC offset calculations
- drm/sched: De-clutter drm_sched_init
- drm/sched: Consolidate drm_sched_rq_select_entity_rr
- drm/xe: Generalize wa bb emission code
- drm/xe: Pass wa bb setup arguments in a struct
- drm/xe: Rename utilization workaround emission function
- drm/xe: Track number of written dwords from workaround batch buffer emission
- drm/xe: Allow specifying number of extra dwords at the end of wa bb emission
- drm/xe: Add plumbing for indirect context workarounds
- drm/xe: Waste fewer instructions in emit_wa_job()
Reviewed (41)
André Almeida
- drm: Do not include <linux/export.h>
- drm: Include <linux/export.h>
- drm/bridge: Include <linux/export.h>
- drm/client: Include <linux/export.h>
- drm/display: Include <linux/export.h>
- drm/gem: Include <linux/export.h>
- drm/panel: Include <linux/export.h>
- drm/scheduler: Include <linux/export.h>
- drm/ttm: Include <linux/export.h>
Maíra Canal
Rodrigo Siqueira
- drm/amd/display: Don’t overwrite dce60_clk_mgr
- drm/amd/display: Fix DCE 6.0 and 6.4 PLL programming.
- drm/amd/display: Don’t overclock DCE 6 by 15%
- drm/amd/display: Adjust DCE 8-10 clock, don’t overclock by 15%
- drm/amd/display: Find first CRTC and its line time in dce110_fill_display_configs
- drm/amd/display: Fill display clock and vblank time in dce110_fill_display_configs
- drm/amd/display: Don’t warn when missing DCE encoder caps
- drm/amd/display: Don’t print errors for nonexistent connectors
- drm/amd/display: Fix fractional fb divider in set_pixel_clock_v3
- drm/amd/display: Fix DP audio DTO1 clock source on DCE 6.
Tvrtko Ursulin
- drm/sched: Fix outdated comments referencing thread
- drm/sched: Remove kthread header
- drm/sched: Cleanup gpu_scheduler trace events
- drm/sched: Trace dependencies for GPU jobs
- drm/sched: Cleanup event names
- drm/sched/tests: Use one lock for fence context
- drm/ttm: Fix build with CONFIG_DEBUG_FS=n
- drm/xe/lrc: Prepare WA BB setup for more users
- drm/amdgpu: Fix memory leak in amdgpu_ctx_mgr_entity_fini
- drm/xe/bo: add GPU memory trace points
- drm/sched/tests: Implement cancel_job() callback
- drm/sched/tests: Add unit test for cancel_job()
- drm/xe/lrc: Reduce scope of empty lrc data
- drm/xe: Count dwords before allocating
- drm/xe/gt: Extract emit_job_sync()
- drm/xe/lrc: Add table with LRC layout
- drm/sched: Make timeout KUnit tests faster
- drm/sched: Add new test for DRM_GPU_SCHED_STAT_NO_HANG
- drm/v3d: Use DRM_GPU_SCHED_STAT_NO_HANG to skip the reset
- drm/sched: Fix racy access to drm_sched_entity.dependency
Acked (5)
Changwoo Min
- Revert “sched_ext: Skip per-CPU tasks in scx_bpf_reenqueue_local()”
- sched_ext: idle: Handle migration-disabled tasks in BPF code