niedziela, 1 marca 2015

LambdaDays 2015 report

Info and insight gathered during a single slice of LambdaDays 2015

LambdaDays just wrapped up, the 2-day functional programming conference in Cracow, with 4 tracks and lots of interesting people.

The most popular languages among speakers and probably also audience were Scala, F# and Erlang. There was also some minority consisting of Haskell, JavaScript, Clojure, Ocaml, Idris and Elm here and there.

Here is a report from the front.

Day 1

 

The conference was opened by keynote given by Garrett Smith from CloudBees. The talk was streamed to the second hall due to overcrowding (with some difficulties). Garrett talked about functional programming in general, giving some interesting patterns found in it.

Then, I decided to begin with the research track. Professor Kevin Hammon gave there a very enthusiastic talk on megacore computing, being a philosophical upgrade over multicore and happening in the near future. Functional programming will make grasping it a reality, bringing unlimited parallelism and abstracting over the processing units executing it, whether CPU or GPU. Although actor model as a solution was unheard of completely, this was overall a pretty insightful talk, centered around concepts of Paraphrase project.

Torben Hoffmann of Erlang Solutions then gave a very engaging talk on Erlang process model, showing concepts not unfamiliar to Akka users. Processes, just like actors, are very lightweight and should be created for every task needed. Torben also talked about processes being restarted independently by supervisor as part of general approach to system resiliency
.
Then next talk I attended was a very research-oriented one by Evelina Gabasova. It was a quite interesting talk about cancer research and R pains followed by F# joy. She showed some code examples of doing functional transformations on sets of data downloaded from online cancer research database and looking for correlations in it.

Lars Hupel then gave a talk in Haskell on functional approach to mocking. It was quite heavy and I think I will need to watch this one again…

The next presentation given by Andrea Magnorsky was on games and using functional programming (F#) in the game development. Although quite interesting, the language was used only in about 10% of the code base, where the performance was not a problem.

The last talk I attended on that day was by Jacek Głodek about practical problems and misuse of typing and functional programming when writing typical applications. It ephasized readability and proper expression of domain model or business logic. My quite rich take-away from it was: be wary of tuples, create more case classes, use helper monads like Option only in places where it expresses the intent well.

Now off for the after-party with free beer and pizza.

Day 2

 

The day was opened by a keynote given Kinga Panasiewicz and gave everyone much positive energy for the start of the day. The topic was centered around the statement that watching computer screen for a longer time is malicious for the brain and can be one of the causes of developing brain diseases like schizophrenia. Quite brave of a statement on a programming conference, I think. The latter part of the talk was centered on brain memory (and negative impact of Google Search on it) and Kinga was engaging with the audience in some memory games.

Then, there was a talk on Big Data by Nilanjan Raychaudhuri, showing some general architecture and approach to building data processing piplines, a bit centered on Spark.

The next presentation, by André van Delft, was concerned with algebras describing reactive systems and in particular one called Algebra of Communicating Processes. However, as I was absent on part of it, I cannot say more.

Just before lunch there was a second keynote, by Rúnar Bjarnason, telling the tale of joys of functional programming. It was a very interesting talk on the essence of functional thinking, with lots of examples in Scala, although rather introductory. Someone asked: why use Scala and not some pure functional language. May be a matter of taste, but next presentation by Konrad answers the question some more.

Konrad Malawski gave a very interesting talk about latency, speed and low level considerations when building high-performance systems. With details about many different concepts, it ephasized having interest and knowledge in internal implementations of the code being used. The bottom line was that although functional programming is great and should be used where possible, someone has to think of those mutable impure internals to make it a solid foundation for cleaner design. When using Scala, we can, and should, take a step down through the abstraction layers and use more imperative style in places when performance requires it. The slides, with impressive bibliography, can be found on SlideShare.

Next presentation, by Adam Warski, was perhaps the most practical one. With high speed and precision, he implemented an Akka application conforming to Reactive Manifesto principles. Using streams, persistence and clustering, the app contained an actor with journaled storage that was location transparent and immune to some of the nodes going down. It was very insightful to see different modules of Akka smoothly interoperating. The project can be found on [GitHub] (https://github.com/adamw/reactive-akka-pres).

It wouldn’t be functional conference without monads. Noel Markham gave a very fast talk about abtracting over certain values when implementing a real-world application concerned with streaming tweets. It showed using monads like Reader, Writer, OptionT and the Monad monad itself. Then, he used ScalaCheck with some Shapeless contrib library to test the application, easly swapping the Futures for some more test-friendly monads.

I decided to end the day with even more monads, by Luc Duponcheel. He showed a simple dsl to rename the concepts of unit, bind, map2, and using applicative functors.



Overall, the conference was a very nice experience with great speakers and organization. It was often very hard to choose a track because of many interesting, conflicting topics. Luckily, all of the talks were recorded.

Also, what I missed was polyglot, language oriented topics – generally, each presentation floated around a single language. I would be great to have a comparison of some functional patterns or features in different languages.

Anyway, see you next time!

This report on GitHub

niedziela, 4 stycznia 2015

Creating Reactive Streams components on Akka Streams

Creating custom publishers and subscribers on Akka Streams

Reactive Streams is an ongoing common effort to create a JVM-wide standard for stream processing with backpressure. It tries to solve a problem of limited resources on multiple machine nodes taking part in a data processing chain or graph. It might consist of two or more nodes - senders and receivers - communicating with each other via messaging. Some of the nodes might be slower than others - they might not be able to process all incoming messages at a rate the senders (producers) are transferring them. Any such receiving node (a consumer) is then forced to skip messages or risk running out of memory.

To avoid flooding slower nodes with data, Reactive Streams introduce backpressure, in a form of upstream (receiver to sender) information flow: the consumer periodically informs the producer of a maximum number of messages it can receive. The producer can send data only on demand. In general, this mode of operation, where the receiver asks the sender for data is called pull-based. In Reactive Streams this specifically refers to the situation, where the consumer is slower and slows down the producer. It is in contrast to push-based mode, where the slower producer sends data not caring for demand. In reactive streams, the processing line between two nodes can dynamically switch between push and pull mode, depending on which side is slower at given moment.

Reactive Streams specification consists of several rather simple Java interfaces and an extensive set of rules every implementation must conform to. Let's look deeper at the interfaces representing different components of the protocol:

A Publisher is the producer of data. It declares only one method:
  • void subscribe(Subscriber<? super T> s) - it is invoked on a Subscriber (a consumer) connection attempt. After the method is called, the Publisher must in turn call the onSubscribe method on the Subscriber, passing a Subscription it creates. The stream is then formed, allowing for the exchange of data

A Subscriber defines 4 methods that are invoked by the Publisher:
  • void onSubscribe(Subscription s) - to store the Subscription created by the Publisher
  • void onNext(T t) - to handle an element of data sent by the Publisher
  • void onComplete() - to handle the end of stream signaled by the Publisher
  • void onError(Throwable t) - to handle an error that occurred on the publishing side

A Subscription consists of 2 methods invoked by the Subscriber:
  • void request(long n) - to signal demand to the Publisher - the number of messages the Subscriber is able to handle
  • void cancel() - to indicate a desire to cancel the subscription and let the Publisher know not to send more data
A Processor interface that is both a Publisher and a Subscriber

During the stream lifetime, the receiving side repeatably requests more elements using the request method. The publisher responds by invoking the onNext method on the subscriber the number of times less or equal the number of requested elements. Streaming can end by either normal completion (when the publisher calls onComplete) or abnormal termination (the publisher calls onError passing an exception). After that, no more elements may be transfered.

Using the Reactive Streams interfaces it is possible to implement many different approaches to exchanging data. However, while powerful, this aproach is relatively low level. Moreover, correct implementation of the raw Publisher and Subscriber interfaces can be quite tricky. For example, the Publisher must keep track of connected Subscriber(s) and monitor demand signaled by any Subscription it created. The Subscriber must in turn balance the rate of elements received and the demand for future elements. Implementation choices for these aspects could be also similar in many different subscribers and publishers.

Fortunately, we can use Akka Streams to let it manage some of the complexity for us and make the implementation much more straightforward.

Creating an ActorPublisher 

Akka Streams is one of the implementations of Reactive Streams, building a high-level data processing DSL on top of the protocol. Beside giving a convenient way of transforming flows of data, Akka Streams defines two traits for implementing custom publishers and subscribers based on actors: ActorPublisher[T] and ActorSubscriber (as of akka-stream-experimental, version 1.0-M2, subject to change).

A skeleton implementation of a publisher may look like the following:

class StringActorPublisher extends ActorPublisher[String] { 
 
   def generateElement() : Try[Option[String]] = ??? 
   def cleanupResources() = ??? 
 
   def receive = { 
      case ActorPublisherMessage.Request(n) => 
         while (isActive && totalDemand > 0) { 
            generateElement() match { 
               case Success(valueOpt) => 
                  valueOpt 
                    .map(element => onNext(element)) 
                    .getOrElse(onComplete()) 
               case Failure(ex) => 
                  onError(ex) 
            } 
         } 
      case ActorPublisherMessage.Cancel => 
         cleanupResources() 
      case ActorPublisherMessage.SubscriptionTimeoutExceeded => 
         cleanupResources() 
   } 
}

Producing data 

After stream is established, ActorPublisher starts receiving Request messages from the subscriber it is connected to, with each such message containing a demand information for a number of elements it requests. Internally, ActorPublisher keeps track of information in a variable by adding each received demand to it and decrementing it on each call to onNext. The current aggregated demand is exposed via always up-to-date totalDemand method, making it easy to react to.

Let's imagine we have a side-effecting data producer method called generateElement that does some computation to generate next element from a sequence. Suppose it normally returns Success(Some(element)), but will return Success(None) if the are no more elements to generate or Failure in case of a fatal error. That models the three cases of passing information downstream, as described by Reactive Streams. Each call to onNext transfers one element to the subscriber. However, only one of onComplete and onError can be signaled and either of these situations end streaming, putting the Publisher in the non-active state.

Subscription management

Apart from the SubscriptionTimeoutExceeded message (received when the subscriber does not subscribe during some time after creating the publisher actor), there is no concept of a subscription or subscriptions to manage. All logic related to the subscribe method from Reactive Streams is hidden in the internal ActorPublisher trait implementation. Request and Cancel messages that correspond to the methods on the Subscription are delivered to the actor itself. ActorPublisher relieves us from manual initialization of the subscriber wiring and later managing the subscription.

What is also apparent from the messages it receives, it can handle only one subscriber. This is the approach Akka team have chosen: Akka Streams has other means for connecting a producer to multiple subscribers (a fan-out component in a form of a Broadcast or Balance). The pattern of connecting multiple subscribers to one publisher is in fact a special case, which can be abstracted away from typical publisher implementations - an algorithm for element distribution can be decoupled from its production. 

Creating an ActorSubscriber

ActorSubscriber implementation is even more straightforward. Also in this case handling the subscription is already taken care of, and what remains is mostly handling the stream of data and lifecycle changes:

class StringActorSubscriber extends ActorSubscriber {
   protected def requestStrategy = WatermarkRequestStrategy(10)

   def processElement(element: String) = ???
   def handleError(ex: Throwable) = ???
   def streamFinished() = ???

   def receive = {
      case ActorSubscriberMessage.OnNext(element) =>
         processElement(element.asInstanceOf[String])
      case ActorSubscriberMessage.OnError(ex) =>
         handleError(ex)
      case ActorSubscriberMessage.OnComplete =>
         streamFinished()
   }
}

Request strategy

ActorSubscriber trait also declares the requestStrategy method that must return an instance of RequestStrategy. This is where the upstream demand generation logic is contained - deciding when to request more elements from the publisher. Manual invocation of request method on a subscription is abstracted away. Akka Streams already defines several strategies, like WatermarkRequestStrategy (sending more demand in batches) or OneByOneRequestStrategy (requesting the next element only after the previous one arrives), it's also simple to create a custom one.

Actor lifecycle

Stream lifecycle is actually not connected to the actor lifecycle. Internally, both traits use external state storage to preserve state between actor restarts. Also, when streaming ends, actor will still live until stopped by custom code. Therefor, any cleanup or release of resources must be done explicitly.

Exposing publishers and subscribers

After creating a custom ActorPublisher or ActorSubscriber, what remains is to convert them into a regular publisher or subscriber that can be used in any library implementing the Reactive Streams protocol. This is done by wrapping their ActorRefs into calls to ActorPublisher.apply[T] or ActorSubscriber.apply[T]:

val system = ActorSystem()

val subscriber: Subscriber[String] = ActorSubscriber[String](system.actorOf(
                     Props(classOf[StringActorSubscriber])))

val publisher: Publisher[String] = ActorPublisher[String](system.actorOf(
                     Props(classOf[StringActorPublisher])))

Conclusion

Akka Streams takes the low level SPI of Reactive Streams and exposes a more programmer-friendly API for managing streams. One side of it is the stream transformation DSL (not covered in this article) used for consuming existing streams. But when a need occurs for a new stream producer or consumer, ActorPublisher and ActorSubscriber traits make implementing it more straightforward and help bridge other components into Reactive Streams.

References


środa, 28 maja 2014

Language Oriented Programming

Recently I discovered the fascinating idea of language oriented programming and language workbenches. Centered mostly around Meta Programming System by Jetbrains and a little shady project of Intentional Software, the topic has been around for over a decade but is not mature enough yet (unfortunately it has not developed as fast as predicted by Martin Fowler). Language oriented programming is focused on bringing some major improvements to software development, the most ambitious one being narrowing the programmer involvement in development process. It seeks to enable programmers to give the domain experts required tools to let them build the software by themselves, where they would be best suited to the task. The programmers then would be able to focus on their specialty, which is the abstract world of programming itself (and finding bugs). They would create simple to use and error-avoiding domain specific languages for the experts to use, designed to the specific domain.

But beside this breakthrough idea, the language oriented programming concepts and its implementation may also revolutionize the development of existing programming languages and typical applications. It all begins with dropping code representation based on text and basing all development on direct changes to the abstract syntax tree. However, manual editing of the AST would not be very user-friendly, so it is done by a projection. In essence, the AST is projected into some editable representation (text-like most often) that interprets any changes made and updates the AST accordingly. So the AST is the single, unambiguous form being edited and the representation is derivative, not the other way around as in current state of art.

After playing with Jetbrains MPS and reading some blogs, I am very much enthusiastic about this concept:
  • The projectional editor will make error checking and code completion instantaneous. It can also make typing faster after some practice if you are not a text maniac and believe it. I have to admit it takes some time to get used to and can be un-intuitive. BUT: the style wars will be gone forever (every developer may create their own looks for the language), and any developer will be able to use any style or representation they like.
  • Implementation of any language is made much simpler, I would argue. Instead of writing a compiler from scratch, as it is done now, using a language workbench abstracts away all the boring work (like writing a look-ahead buffer for parsing). In an ideal world, this would become a standardized way of creating any language and all programmers would unite and cherish each other! As you can see, this is a second fact, along the lack of style wars, that Language Oriented Programming strives for world peace.
  • Implementing some new features in a language becomes way simpler. No more backward compatibility issues, no more careful and hard work of hunting all the edge cases with interaction with existing features. The new features becomes just an extension to a language, integrating seamlessly with the language it wraps. Getting it involves just downloading a module from web repository after some other team member starts using the fancy feature in his code. All typical human programmers can create any syntactic sugar they like.
  • The version control becomes intelligent. No more text comparing and trying to deduce logical changes from it. The version control system would be an integral part of the project being developed, seeing all changes ever made. A commit would become just a grouping of some changes. Even more importantly, changes could be automatically grouped based on logical, higher level differences, like creation of a new class, renaming a method and all references. Reordering the class members could be made transparent to the users, presenting no changes. Viewing changes of other team members would be easier and merge conflicts would be simpler to resolve.
I think these are some fundamental improvements of the way of programming making writing code much easier if one is able to trade the familiar text representation for them. While the projectional representation is not text, the MPS guys made a very good job improving it so it feels more like text. In the future, the difference might be even smaller.

Unfortunately, although via those features, creating code and collaborating would become easier, other aspects of software development remain mostly unchanged, like testing and debugging. And they are the causes making programming hard in general. This is mostly another topic, but Jonathan Edwards has some very interesting (and controversial) ideas about how to start programming profession from scratch.

sobota, 3 maja 2014

Unit testing the functional way


Recently I stepped upon an interesting (although old) blog post by Christian Sunesson.
He got a point in saying that in OO programming the good design practices that make testability possible are very close to some functional programming principles. Unit testing and TDD encourages making the operations cleanly separated, asserting on value objects and refering to verification capabilities of mocks as a last resort or not at all. The more pure the method is, the more easy it is to test it. John Sonmez also refers to this topic.

But how to do unit testing with mostly pure functions and no mocks? How to design a system with small and large components and avoid dependencies between them? After all, we need to test in isolation. This brings a picture like this one below:


We unit test only the blue parts of the system and bind them together using the large green one.It does not look very scalable. However, the green part, which is the application layer, might in turn be composed of a few subsystems (and communicating with other such component groups), wrapping all hard-to-unit test logic, like concurrency. Then, only end-to-end tests could be used to verify it.

Each of the non-dependent blue leaf components could consist of only pure functional logic. But what about some leaf component that is overly complicated and should be refactored into several smaller ones? it is still a leaf from outside point of view, but now it has some dependencies inside.

In order to avoid mocks, the test isolation principle should be changed: testing the root component is not done in isolation - it uses the actual smaller parts.

But isolation paradigm is in place to quickly reason about failed tests. If some test fails, we can be sure that the faulty logic is the one it invokes in the first place, in isolation. It is not the case in testing dependent components without mocks: the fault might be inside some sub-part that is invoked in the test dealing with compound objects. It looks like it would be harder to find the bug.

But lets make an assumption, that all the sub-parts are also covered by tests. Each of the components is tested on its own (but not in isolation), being part of a test "onion" -- every higher level, more complex component test adding an extra layer on it. Then, the failed tests will form a red cross section through this onion. Finding a bug is such a test suite is just a matter of getting to the core of the cross section -- locating the most simple component test that is failing.

This kind of approach clearly asks for tooling support. If the dependency graph of components could somehow be hierarchically visualized, then locating such bugs would be simple. And many mocks could be avoided.

piątek, 21 marca 2014

Scoping Hierarchical Dependency Injection

Dependency injection is now widely known and used patten for structuring the application (and it is superior to NOT using it and new-ing objects at any time). It makes the application easily testable and just as importantly makes reasoning about it simpler (every dependency used by a component is clearly defined).

Lets take a look at relatively simple DI case in a desktop application. We have only constructor injection available and two possible creation modes for a class: singleton and new-instance. The former makes the DI container manage a single instance of the class for all objects requesting it. The latter leads to creation of a new instance for each request.

This is not a very useful tool. We can create the object graph only at a single point in time - by getting the root of the dependency hierarchy at application initialization moment (with no cheating - direcly using the container - afterwards). We need some approach to getting objects from the container dynamically.

Now factories come handy fulfilling this requirement. By declaring a factory dependency, we state that at some point in time current class will create a new object (as factories make no sense for singleton-scoped objects), and with it - a subsystem of its dependencies. This is an elegant way to requesting instances dynamically from the container without using it directly in the application code (which would make the class doing it harder to test and reason with). In the case of Java's Guice, the idea of a factory is called 'provider' and is described here https://code.google.com/p/google-guice/wiki/InjectingProviders . In .NET, Ninject framework has an extension with excellent description of the concept: https://github.com/ninject/ninject.extensions.factory/wiki [Ninject Factory wiki]

Getting the object from a factory will under the hood use the DI container to create a new object and configure its dependency tree (by injecting singletons, new object instances or other factories).

[As Google Guice points out, a provider might not always create a new object. It can be made to return some object scoped wider than new instance and narrower than singleton, that exists in global space (so it may exist at most one such object) at a time it is requested, for example bound to current web session. However, I think it might be less readable than letting the current object constructor-inject the session-scoped service. In later point in this discussion, we will talk about multiple scopes co-existing simultaneously]

Using factories is a great boost in freedom and possibilities in designing the application. Now, some services might be created on demand and they can in turn use some dependencies from the container. But factories defined this way may only use some global singleton objects or create new objects in a new-instance mode. We may still improve on this idea.

Imagine the case where some objects should have the same lifespan (or scope) as others. They in turn could create other objects dynamically, with separate scopes. Such created scope would be in a parent-child relationship with the scope of the object that created it. What is even more interesting, the new-instance and singleton scopes I described earlier have now a different meaning. The new-instance corresponds to creating a child scope. And that newly created object is now a singleton from a point of view of all objects created along with it (they are singletons to each other) and from a point of view of all objects in all child scopes that might be subsequently created.

Multiple child scopes in an application

This approach of hierarchical dependency injection is useful for stateful applications, where entire subtrees consisting of components and services might be created on demand, living and storing state for a limited time. They can be easily discarded by services in parent scopes when no longer needed.
The practical implementation of this idea is possible by using child injectors (called child kernels in Ninject). Each child injector can access objects defined for itself and parent injector. That allows the application to create separate scopes for object sub-trees, even multiple copies of object groups based on the same classes.

 The whole idea has some similiarities to actor model, where an actor can be created to do specific task and be stopped afterwards by its parent. Every service higher in hierarchy then seems to be stable and eternal for that actor/component. The hierarchical dependency injection might, after some improvements, make application easier to comprehend and make the binding declarations less flat and more intuitive.

środa, 6 listopada 2013

Imagining program's API

I have to say I'm not exactly a fan of command-line interfaces (CLIs). Graphical UI should be a standard way for almost all applications to communicate with a human being. Eyes are our most-used gate to the world and through it we can process information most quickly. We should take that possibility and design the most simple and elegant, easy to learn and use GUIs.

But what if not an ordinary human, but a computer technical person was the user?

Programmers and power users like to automate things, build larger constructs from smaller blocks and elements that work together. They would want to take a program and use it in some automated and complex task. That's why almost every program should have a command line interface alongside the GUI. Or a blend of both?

The principles of command line interface design should be similar to that of graphical interface. But looks like they are not. The CLIs did not change much in the last two decades of Linux and Windows (the latter did not even improve the almost unusable cmd.exe program over the years). The CLIs are not usually easy to learn and use. Chris Bernard points out some other flaws. However, I do not agree with him on the solution (XML). Because the short names of programs and options are, once you get to learn them, very fast to type and often faster than GUI to get the job done. And they should stay that way.

I think something can be done to improve the overall state of matters while not hurting the typing speed of the command line. In fact, the Windows Powershell brought some significant changes to the thinking about the CLI. Most of all - it is designed around passing of objects instead of text between different programs and commands. It also has a concept of cmdlets that are to offer a bit of categorization to the commands and make them more readable by using prefixes Get- List- etc.

But still, it is far from the usability of managed programming languages. Why can't we simply call a method on a program?


Instead of passing control parameters, the program could be invoked by calling methods and chaining them. Each program would contain a manifest describing its API allowing for method completion and to type just as fast as with short control parameters.

Even Scala and C#, despite being statically typed languages, already got it covered: a console with type and method completion. Of course, programming language is not shell script. The target audience is somewhat different. But the principles of UI design should touch all alike. I think there is a need for a new approach to CLI usability, to make it more readable, faster to learn and more appealing to power users.