* Introduce Jit32 and JitCore32 objects
* Initialize JIT when launching 32bit executables
* Introduce kernel objects for 32bit processes
This commit introduces two new kernel thread types, `KNceThread` and `Jit32Thread`.
`KNceThread`s behave like the previous kernel thread object by setting up thread state and jumping into guest code.
`KJit32Thread`s need to run guest code on a `JitCore32` object, so they perform the necessary state setup and then they also setup the jit core for executing guest code. A loop was introduced because jit execution might return when halted, either for an SVC or for preemption. In those cases the thread needs to wait to be scheduled before executing again.
The process object has also been updated to be able to create 32bit threads when running 32bit processes.
Additionally NCE's ThreadContext has been removed from DeviceState, since a thread is not an NCE thread only anymore, and IPC code has been changed to retrieve the tls region from the thread object.
* Introduce a preemption handler for scheduling with JIT
Scheduler initialization has been delayed until process information is available, as it needs to differentiate between 32bit and 64bit processes.
* Support initializing VMM for 32bit address spaces
* Implement GetThreadContext3 SVC for 32bit processes
* Introduce a thread local pointer to the current guest thread
This also gives easier access to the current guest process structure via the thread structure, just like any kernel does for their internal structures.
* Add a signal handler for JIT threads
* Implement coprocessor 15 accesses
* Implement exclusive memory writes and exclusive monitor
* Enable JIT fastmem
* Enable more JIT optimizations and log exceptions
* Fix incorrect logging call in QueryMemory
* Translate guest virtual addresses on direct accesses from SVCs
* Perform TLS page address translation for direct accesses
This allows the IPC code to work without modifications since `KThread::tlsRegion` now stores a host address that can be accessed directly.
* Add Dynarmic as a submodule
* Revert "Perform TLS page address translation for direct accesses"
This reverts commit 2e25b3f7e4f0687b038fa949648c74e3393da006.
* Revert "Translate guest virtual addresses on direct accesses from SVCs"
This reverts commit 7bec4e0902e6dbb6f06a2efac53a1e2127f44068.
* add an option to change cpu backend
* Fix
---------
Co-authored-by: lynxnb <niccolo.betto@gmail.com>
Starting from version 26+, the NDK is based on LLVM 17 and comes with Clang 17 featuring full language and library C++20 support.
This means we can get rid of the massive LLVM submodule in the repo, which will be done in a following commit.
Co-authored-by: nickbeth <nickbeth>
AsyncLogger pushes messages into a message queue, a thread on the other side takes care of writing messages out to file and logcat. Logging macros use the `fmt::format` formatting syntax.
RequestSyncDeliveryCache service stub
Update app/src/main/cpp/skyline/services/bcat/IDeliveryCacheProgressService.cpp
Co-authored-by: Pablo González <71378035+PabloG02@users.noreply.github.com>
adding spaces at the end of the files
These are mostly implemented how you would expect, however as opposed to copying out query pool results immeditely, doing so is delayed until the RP end in order to avoid splits.
The yuzu audio_core code is mostly untouched, with a set of wrappers used to bridge it with skyline kernel primitives. Huge thanks to maide and their advice, whom without this wouldn't have been possible.
By only using what we need, and mirroring the descriptor structs to allow for much tighter packing (while keeping the same member names) we can reduce pipeline memory to about 1/3 of what it was before.
All writes are done async into a staging file, which is then merged into the main pipeline cache file at the time of the next launch. Upon encountering file corruption the cache can be trimmed up to the last-known-good entry to avoid any excessive loss of data from just one error.
Introduces the base abstractions that will be used for pipeline caching, with a 'PipelineStateBundle' that can be (de)serialised to/from disk and an abstract accessor class to allow switching between creating disk-cached pipelines and fresh ones.
Symbol hooking is required for HLE implementations of certain features in the future such as `nvdec` and for more in-depth debugging of games as we can inspect them on a SDK function level which allows us to debug issues far more easily.
Previously, both I2M uploads and DMA copies would force GPU serialisation if they happened to hit a trap or were used to copy GPU dirty buffers. By using the buffer manager to implement them on the host GPU we can avoid such slowdowns entiely.
Ontop of the TIC cache from previous code a simple index based lookup has been added which vastly speeds things up by avoding the need to hash the TIC structure every time.
gm20b performs instanced draws by repeating draw methods for each instance, the code to detect this together with the cost of interpreting macros took up around 6% of GPFIFO time in Metro Kingdom. By detecting these specific macros and performing an instanced draw directly much of that cost can be avoided.
gpu-new will use a monolithic pipeline object for each pipeline to store state, keyed by the PackedPipelineState contents. This allows for a greater level of per-pipeline optimisations and a reduction in the overall number of lookups in a draw compared to the previous system.
Adepted from the previous code to use dirty state tracking. The cache has also been removed since with the new buffer view and GMMU optimisations it actually ended up slowing lookups down, another result of the buffer view optimisations is that raw pointers are no longer used for buffer views since destruction is now much cheaper.
Constant buffer updates result in a barrage of std::mutex calls that take a lot of time even under no contention (around 5%). Using a custom spinlock in cases like these allows inlining locking code reducing the cost of locks under no contention to almost 0.