diff --git a/graphics.tex b/graphics.tex
new file mode 100644
index 0000000..aff263a
--- /dev/null
+++ b/graphics.tex
@@ -0,0 +1,407 @@
+% Beamer Presentation
+% LaTeX Template
+% Version 1.0 (10/11/12)
+% This template has been downloaded from:
+% License:
+% CC BY-NC-SA 3.0 (
+\mode<presentation> {
+ % Replaces footer line with a simple slide count.
+ \setbeamertemplate{footline}[page number]
+ % Remove navigation symbols.
+ \setbeamertemplate{navigation symbols}{}
+% Allows the use of \toprule, \midrule and \bottomrule in tables
+%T1 fontenc does not work on Parabola
+% Use symbols instead of numerals for footnotes.
+% Reset footnote counter every section.
+% Short title appears at the bottom of every slide.
+% Full title only on the title page.
+\title[graphics]{Graphics acceleration on Replicant}
+\author{David Ludovino (@dllud) \and Ricardo Cabrita (@GrimKriegor)\thanks{\footnotesize with great support from Joonas Kylmälä (@Putti)}}
+% Your institution as it will appear on the bottom of every slide.
+% Your institution for the title page.
+NLnet - NGI0 PET Fund\\
+ % Print the title page as the first slide.
+ \titlepage
+ \frametitle{Motivation}
+ All supported devices lack a free software GPU driver.\\\bigskip
+ Replicant 6 relies on libAGL which uses the libpixelflinger software render (both deprecated since 2013).
+ \frametitle{Motivation}
+ Lack of GLES 2.0 leads some critical applications to crash (e.g. Firefox)\\\bigskip
+ Rendering performance has degraded throughout Android versions.\\\bigskip
+ Replicant relies on patches to the Android framework to make things like the camera application work.
+ \frametitle{Objectives}
+ Put together a graphics stack:\\
+ \begin{itemize}
+ \item Compatible with Android 9's HALs.
+ \item Provides at least GLES 2.0.
+ \item Flexible enough to do rendering with both Mesa and SwiftShader.
+ \item Uses hardware rendering on devices with a free GPU driver.
+ \end{itemize}
+\section{Graphics hardware architecture}
+ \Huge{\centerline{Graphics hardware architecture}}
+\subsection{Exynos 4412 SoC components}
+ \frametitle{Graphics hardware architecture — Exynos 4412 SoC components}
+ \begin{center}
+ \includegraphics[width=0.9\textwidth]{img/odroid-u_block_diagram.jpg}
+ \footnote{Source: Hardkernel Co., Ltd.}
+ \end{center}
+\section{Graphics software architecture}
+ \Huge{\centerline{Graphics software architecture}}
+\subsection{Android 9 graphics architecture}
+ \frametitle{Graphics software architecture — Android 9}
+ \begin{center}
+ \includegraphics[height=0.8\textheight]{img/ape_fwk_graphics.png}
+ \footnote{Source: Android Open Source Project under CC BY 4.0}
+ \end{center}
+\subsection{Replicant 9 graphics components}
+\subsubsection{Hardware Composer HAL}
+ \frametitle{Graphics software architecture — Replicant 9 HWC HAL}
+ Hardware Composer HAL: drm\_hwcomposer
+ \begin{itemize}
+ \item Supports HWC2 HAL.
+ \item Works on top of DRM (can use hardware composing acceleration).
+ \item Under active maintenance (hosted by
+ \item Also used by Android-x86.\\\bigskip
+ \end{itemize}
+\subsubsection{Gralloc HAL}
+ \frametitle{Graphics software architecture — Replicant 9 Gralloc HAL}
+ Gralloc HAL: gbm\_gralloc
+ \begin{itemize}
+ \item Implements Android Gralloc HAL API version 0 and 1.
+ \item Compatible with drm\_hwcomposer.
+ \item Compatible with Mesa.
+ \item Uses Mesa's GBM (Generic Buffer Management) for buffer allocation through libgbm. GBM then calls DRM.
+ \item Supports PRIME fd.
+ \item Originally by Rob Herring, now maintained by Android-x86.
+ \end{itemize}
+\subsubsection{OpenGL ES renderer}
+ \frametitle{Graphics software architecture — Replicant 9 GLES}
+ OpenGL ES renderer: Mesa
+ \begin{itemize}
+ \item Support for both software and hardware rendering.
+ \item Big and active community (maintained for years to come).\\\bigskip
+ \end{itemize}
+ Mesa driver: kms\_swrast
+ \begin{itemize}
+ \item Uses any Gallium software renderer as backend (softpipe or llvmpipe).
+ \item Does mode setting through the kernel (KMS).\\\bigskip
+ \end{itemize}
+ Alternative GLES renderer: SwiftShader
+ \begin{itemize}
+ \item Optimized for ARM CPUs.
+ \item Has Vulkan software rendering.
+ \end{itemize}
+ \Huge{\centerline{Implementation}}
+\subsection{drm\_hwcomposer + gbm\_gralloc}
+ \frametitle{Implementation — drm\_hwcomposer + gbm\_gralloc}
+ Initially both required the use of the drm/exynos master node
+ \begin{enumerate}
+ \item DRM Auth hack (both on /dev/dri/card0)
+ \item DRM vGEM inclusion (gbm\_gralloc on /dev/dri/card1)
+ \item DRM allow dumb buffers (gbm\_gralloc on /dev/dri/renderD128)\\\bigskip
+ \end{enumerate}
+ At the time we had some graphical glitches we thought were due to inter driver memory sync.\\\bigskip
+ Running on the same driver does not require memory synchronization.\\\bigskip
+ Allows drm/exynos to allocate memory where adequate according to the type of plane (primary, overlay or cursor).
+\subsection{Allow kms\_swrast to use drm/exynos}
+ \frametitle{Implementation — Allow kms\_swrast to use drm/exynos}
+ Small tweak: Add exynos to the kms\_swrast list on external\_mesa3d.\\\bigskip
+ How to upstream this?
+\subsection{HW planes + devfreq}
+ \frametitle{Implementation — HW planes + devfreq}
+ We were then using kms\_swrast with the softpipe backend.\\\bigskip
+ Enabling DRM hardware planes was another attempt at squeezing some extra performance out of the hardware.\\\bigskip
+ However this led to some interesting shenenigans.
+ \frametitle{Implementation — HW planes + devfreq}
+ \begin{center}
+ \includegraphics[width=\textwidth]{img/glitches.jpg}
+ \end{center}
+ \frametitle{Implementation — HW planes + devfreq}
+ Tentative explanation by ahajda:
+ \begin{enumerate}
+ \item devfreq lowers display clock frequencies too aggressively.
+ \item DMA transfers of overlays are too slow and result in screen corruption.\\\bigskip
+ \end{enumerate}
+ Temporary fix: disable devfreq.
+\subsection{Testing software renderers}
+ \frametitle{Implementation — llvmpipe}
+ kms\_swrast with softpipe was unbearably slow, even with DRM HW planes enabled.\\\bigskip
+ Required:
+ \begin{itemize}
+ \item Finding out what Android-x86 had previously done.
+ \item Porting it to Android 9.
+ \end{itemize}
+ \frametitle{Implementation — llvmpipe}
+ android: Enable llvmpipe when using the swrast driver\\
+ android: Fix build with LLVM for Android 9\\
+ \frametitle{Implementation — SwiftShader}
+ Required:
+ \begin{itemize}
+ \item UDIV and SDIV instruction emulation (in the kernel).
+ \item Android emulator composer: ranchu.
+ \item Default Android gralloc.\\\bigskip
+ \end{itemize}
+ Proved to be 1.5 - 2x faster than llvmpipe.
+ \frametitle{Performance}
+ \centerline{SwiftShader \textgreater{} llvmpipe \textgreater{} softpipe}
+\subsubsection{SwiftShader with LLVM}
+ \frametitle{Performance — SwiftShader with LLVM}
+ We managed to find a SwiftShader revision that uses LLVM as a backend instead of SubZero and is still compatible with our frameworks\_native.\\
+ \lstset{language=C}
+ \begin{lstlisting}
+Lineage 16 / Android 9 / Replicant 9
+SurfaceFlinger: OpenGL ES 2.0 SwiftShader
+Android Q
+Default to LLVM 7.0 JIT in Android build
+SurfaceFlinger: OpenGL ES 3.0 SwiftShader
+ \end{lstlisting}
+ No noticeable performance difference.
+\subsection{Why is Replicant 6 much faster?}
+ \frametitle{Performance — Why is Replicant 6 much faster?}
+ Emulator switches? NO\\
+ ro.kernel.qemu=1\\\bigskip
+ High end graphics options? NO\\
+ ro.config.avoid\_gfx\_accel=1\\\bigskip
+ Pixel format (RGB565)? Paul says YES (very hardware dependent)
+ \Huge{\centerline{Future}}
+\subsection{RGB565 across entire stack}
+ \frametitle{Future — RGB565 across entire stack}
+ \begin{itemize}
+ \item gbm\_gralloc
+ \item drm\_hwcomposer
+ \item drm/exynos\smallskip
+ \end{itemize}
+ All using RGB565.\\~\\
+ Potential performance breakthrough.\\\smallskip
+ If so, how to futureproof this?
+\subsection{devfreq: which device needs clock boost? enable devfreq}
+ \frametitle{Future — devfreq: which device needs clock boost?}
+ \begin{enumerate}
+ \item Test each device independently through sysfs.
+ \item Identify which one is causing the corruption (tip: FIMD/LCD path).
+ \item Boost clock/voltage on userspace or kernel config.
+ \item Re-enable devfreq.
+ \item Workout patch to fix upstream.
+ \end{enumerate}
+\subsection{SwiftShader + drm\_hwcomposer}
+ \frametitle{Future — SwiftShader + drm\_hwcomposer}
+ Advantages (vs ranchu):\\
+ \begin{itemize}
+ \item hardware planes
+ \item DRM node instead of direct framebuffer
+ \end{itemize}
+\subsection{Profiling, benchmarks and conformance}
+ \frametitle{Future — Profiling, benchmarks and conformance}
+ Profiling: turn on profiling switch on Mesa + simpleperf?\\\bigskip
+ Benchmarks: ask Android-x86 (proprietary?)\\\bigskip
+ Conformance: dEQP (drawElements Quality Program) and piglit
+\subsection{2D acceleration on drm\_hwcomposer}
+ \frametitle{Future — 2D acceleration on drm\_hwcomposer}
+ Software-based: Pixman (has ARM NEON fast path)\\\bigskip
+ Hardware-based: Exynos FIMG2D (Fully Integrated Mobile Graphics 2D)
+\subsection{SDIV/UDIV on compiler-rt}
+ \frametitle{Future — SDIV/UDIV on compiler-rt}
+ \begin{itemize}
+ \item Patch with kernel emulation of SDIV/UDIV is not optimized.\\\bigskip
+ \item Try compiler-rt's builtins instead.
+ \end{itemize}
+\subsection{ARM NEON on llvmpipe}
+ \frametitle{Future — ARM NEON on llvmpipe}
+ ARM NEON: SIMD instructions\\\bigskip
+ How to use:
+ \begin{itemize}
+ \item<1-> Tune \textbf{auto-vectorization} on LLVM: easy to try; possible to upstream.
+ \item<3-> Ne10 library: easy to use; difficult to upstream (requires new deps).
+ \item<4-> \textbf{Neon intrinsics}: nice compromise between performance and code complexity; possible to upstream.
+ \lstset{language=C}
+ \begin{lstlisting}
+ #include <arm_neon.h>
+ uint8x8_t va, vb, vr;
+ vr = vadd_u8(va, vb);
+ \end{lstlisting}
+ \item<2-> Neon assembly: too cumbersome (e.g. manual register allocation).\\\bigskip
+ \end{itemize}
+ Borrow ideas from Pixman, Skia and libyuv (all these have NEON fast paths).
+ \frametitle{Future — ARM NEON on llvmpipe}
+ \begin{center}
+ \includesvg[width=0.8\textwidth,inkscapeformat=png]{img/Mesa_layers_of_crap_2016.svg}
+ \footnote{Source: ScotXW on Wikimedia under CC0}
+ \end{center}
+ How to use intrinsics when llvmpipe must output LLVM IR?\\\medskip
+ Can LLVM IR contain ARM NEON assembly code?
+ \frametitle{Future — Lima}
+ The holy grail.\\\bigskip
+ Quite active now. New commits every week.\\
+ No idea of current compliance (asked devs to update \texttt{features.txt}).\\\bigskip
+ Planned approach: offload implemented GL operations to Lima.
+ \begin{itemize}
+ \item Where in the stack should we intercept GL operations? GLSL IR? TGSI?\\
+ \item Won't the overhead of interception, introspection and dispatch kill any performance gains?
+ \end{itemize}
+%with software render fallback for missing GLES functions
+ \Huge{\centerline{Questions?\footnote{Ask Putti the hard ones. xD}}}
