Evil Eval – PHP Scripting in .NET with PeachPie¶
PHP is a language well known for its many dynamic features that are occasionally somewhat overused. Whether it is dynamic typing, type juggling or dynamic code evaluation, these features are very demanding on the underlying runtime. Another feature, the so called ‘eval’, is no less interesting. Let’s examine it further.
Yes, PeachPie supports 'eval'¶
… and create_function
. Before we describe all of the scripting mechanisms, we would like to announce that it is now possible to use eval with Peachpie-compiled programs. Nevertheless, it is not necessarily something we recommend to use. This is why we output a warning message if you do use it, stating that eval is dangerous and may cause high memory and CPU usage. The reason for our caution with this function is further described in the following sections.
Check out the this piece of PHP snippet.
Let’s take a look at the above sample snippet in more detail. The dynamic runtime provided by Peachpie offers a number of benefits here. For instance, a program can use classes and functions that have not been defined. The use of X
is compiled as a dynamic instantiation of a type resolved in runtime. The actual type depends on what is declared in the current Runtime Context, which represents the life of a single request to a PHP page. While the code seems clean and short, the compiler-generated instructions handle a complete dynamic evaluation and allow for an injection of a dynamically compiled assembly from eval. So how does all of this work?
.NET Roslyn representation¶
Before we get into how Peachpie handles eval()
, let's have a look at Roslyn and scripting in C# (Microsoft.CodeAnalysis.CSharp.Scripting
). These scripting APIs enable .NET applications to instantiate a C# engine and execute code snippets against host-supplied objects. This is exactly what we are talking about – an API that parses a piece of source code in C# and allows you to run it. The Roslyn scripting library implements an exemplary way of how to do it in .NET. So how does this mechanism work with PHP?
Problem: Dynamic Evaluation of PHP code in C¶
So what if we would like to programmatically process a piece of PHP code and run it within a C# application? This is an extremely useful feature for example for extending a C# app or a game with the ability to let users write scripts in PHP. The point is to side-load a piece of PHP code and run it efficiently, optionally resulting in a set of additional diagnostics listing errors and warnings.
We would like for something like the following piece of pseudo code to work in C#:
Solution: Compile PHP dynamically to .NET in-memory¶
Since this is one of the major advantages of the whole Roslyn platform, we have also implemented the scripting ability of the PHP language in Peachpie. The process consists of steps that simulate how the compiler works – but all of this in-memory:
test snippet, ensuring that create_function works.
In-memory compilation of PHP¶
How is this possible in .NET? Roslyn defines how an obedient compiler should behave:
- First, the source code is parsed. For the purposes of eval, we handle the missing opening tag (
<?php
) implicitly, so it is as if it were there. The helper is called PhpSyntaxTree and is subject to changes. In any case, its purpose is obvious. As a result, we have a SyntaxTree object representing the source code. -
Then the magic happens. We instantiate the compiler (just like a
new PhpCompilation()
) with the appropriate options. In case we are going to compile more scripts, we wisely choose a random resulting assembly name (using an auto-increment variable) and specify that we are going to generate a dynamically linked library (not exe). The interesting parameter is the references – we have to collect all DLLs needed by the compiled script, including the .NET runtime itself. This is how the compiler works. For the complete sample, please see this page. -
The compilation object now provides a simple looking method –
Emit
. If you look closely, its parameters let you specify not the output file, but the pureSystem.IO.Stream
. After calling the method using MemoryStream instances, we have raw bytes of the resulting DLL file (assembly). - The magic continues by making use of the
System.Runtime.Loader
NuGet package. Its helper classSystem.Runtime.Loader.AssemblyLoadContext
allows us to load the assembly consisting of bytes from memory into the current application. This also gives us a reference to the Assembly object instance needed later. In case of running on full .NET 4.0, there is an older API with the same effect –System.Reflection.Assembly.Load
. - Now, the declared types and methods from the assembly are available to the running application. Only the script compiled within the assembly needs to be run. After looking into the Peachpie internals (please see here for your reference), we find out that the global code is compiled into a specially named static method
Main
within a special namespace. The method can be found using System.Reflection and converted to a .NET delegate – and called.
Advantages¶
At first glance, it may seem as if this is the same as compiling the script on the command line and running it. However, this approach implies a plethora of new possibilities. In addition to the ability to compile code dynamically within your C# (or any .NET) app, you have complete control over the compilation.
The compilation object can be cached. The compilation may take some time since all the referenced assembly metadata has to be parsed and represented internally as a Roslyn Symbol. This approach gives the caller complete control and the option to re-use one compilation object with already parsed metadata for compiling more separate assemblies without this overhead. The reduced time is significant, allowing for repetitious compilations or the compilation of hundreds of separate scripts to be achieved in seconds or less.
Subsequent compilation. Upping the hardcore-ness level by one more notch – the caller can compile scripts that reference other in-memory assemblies. For huge projects like the ones usually coded in PHP and those that are frequently being modified, we can compile files separately and subsequently, efficiently and in-memory. Every script can be compiled lazily only if needed and the compilation can reference previously compiled scripts that are also in-memory only. The compilation API doesn’t care and we can make strong references between in-memory assemblies. The .NET assembly loader then resolves the referenced assemblies within an AssemblyLoadContext
that is fully managed by the caller (see here for more details), giving instances of Assembly
for AssemblyName
. That is how in-memory assemblies (i.e. non-existing ones) are referenced. It is the same as the old approach of using the AppDomain.AssemblyResolve
event.
Disadvantages¶
There are two major drawbacks. The first are obviously security concerns. Scripting in runtime allows to inject potentially dangerous code, originating in a non-controlled environment from the user’s perspective, while potential warning diagnostics are usually ignored.
Secondly, keep in mind that the loaded assemblies are not the subject of garbage collection and therefore will not ever be freed from memory. Without proper caching of eval’ed code (which Peachpie does implicitly), it is possible to end with an OutOfMemoryException
.
Consequences of using (eval)¶
With all of this in mind, the implementation of the evil eval functionality is straightforward. Peachpie supports eval()
since version 0.6.0
when an optional dependency to Peachpie.Library.Scripting
is specified. A call to eval is forwarded to the implementation in this package. The code is compiled in-memory if it couldn’t be found in the cache already, and its entry point is called. The caching mechanism reduces memory and CPU requirements.
Furthermore, the mechanism takes advantage of compilation caching. This is noticeable when running eval more than once. Every other call to eval is significantly faster than the very first one.
Note that as of now (version 0.6.0
), the eval implementation is for demo purposes only. It is subject to changes and it is definitely not recommended for production use since it is frequently an indication of bad programmer practices. At the very least, we would like to discourage its use and suggest to avoid it if at all possible.