Make sure you know which Unicode version is supported by your programming language version

While enhancing CATS I recently added a feature to send requests that include single and multi code point emojis. This is a single code point emoji: 🥶, which can be represented in Java as the \uD83E\uDD76 string. The test case is simple: inject emojis within strings and expect that the REST endpoint will sanitize the input and remove them entirely (I appreciate this might not be a valid case for all APIs, this is why the behaviour is configurable in CATS, but not the focus of this article).

I usually recommend that any REST endpoint should sanitize input before validating it and remove special characters. A typical regex for this would be [\p{C}\p{Z}\p{So}]+ (although you should enhance it to allow spaces between words), which means:

  • p{C} - match Unicode invisible Control Chars (\u000D - carriage return for example)
  • p{Z} - match Unicode whitespace and invisible separators (\u2028 - line separator for example)
  • p{So} - matches various symbols that are not math symbols, currency signs, or combining characters; this also includes emojis

I have a test service I use for testing new CATS fuzzers. The idea was to simply use the String’s replaceAll() method to remove all these characters from the String.

So let’s take the following simple code which aims to sanitize a given input:

    public static void main(String... args) {
        String input = "this is a great \uD83E\uDD76 article";
        String output = input.replaceAll("[\\p{C}\\p{So}]+", "");

        System.out.println("input = " + input);
        System.out.println("output = " + output);
    }

While running this with Java 11, I get the following output:

input = this is a great 🥶 article
output = this is a great  article

Which works as expected. The 🥶 emoji was removed from the String as expected.

Even though I have CATS compiled to Java 8, I mainly use JDK11+ for development. At some point I had CATS running in a CD pipeline with JRE8. The emoji test cases generated by the CATS Fuzzers, started to fail, even though they were successfully passing on my local box (and on other CD pipelines). I went through the log files, the request payloads were initially constructed and displayed ok, with the emoji properly printed, but while running some pattern matching on the string the result was printed as sometext?andanother. The ? is where the emoji was supposed to be. Further investigation led to the conclusion that what caused the mishandling of the emoji was the JRE version (which might be obvious for the 99.999% of Java devs out there). Which is actually expected as Java 8 is compatible with Unicode 6.2, while 🥶 is part of Unicode 11.

Going back to the previous example, if I run it with Java 8, I get the following output:

input = this is a great 🥶 article
output = this is a great ? article

Conclusions:

  • Even though a Java version can receive, write/store and forward the latest Unicode characters, any attempt to manipulate them might result in weird ? symbols if the Unicode char is not from the version supported by your JRE version
  • Independent on how you compile the code, it’s the JRE that decides how the Unicode chars are handled i.e. a Java program compiled as Java 8 will have different behaviour in JRE 8 vs JRE 14

An incomplete list of practices to improve security of your (micro)services

Software security is hard and complex. Many people think about it as something aside from the typical development process. It’s usually seen as a responsibility of some security people that only care about security and don't understand that we need to deliver business value fast in an already complicated microservices-event-driven-api-frist-ha-cloud ecosystem. I could add a lot more dashes to microservices-event-driven-api-frist-ha-cloud. And I think this is the main reason it might seem overwhelming to think early about security and all the possible cases something or someone can break your system. It’s yet another complex thing in an already complex environment. It’s not just all the technical complexities of modern architectures, it’s also all the additional stuff: need to go to market fast, hard deadlines, team chemistry issues, underperformers, too many processes, meetings, etc. And it’s a complex thing that will not break your system in day 1. It might take months/years until someone will find a vulnerability. Why focus on this from day 1? Well, you might be right. The chances of something happening from day 1 are low. And it’s very tempting to focus on something with immediate value (actual functional features), rather than mitigating some future possibilities. The thing is, when a security issue happens, it can bring your entire system down. And this will be very bad for you and your users.

I see it similar to airport security. We do all these checks, we scan people, we forbid them to take things onboard and so on, although 99.99..9% of people don’t plan to hijack the plane. It’s for the 0.00..1% of the cases that we have all these measures in place. Because the consequences are big.

So how do you balance between not over-engineering security and be paranoiac about everything while still focusing on the business value? You make it a mindset, rather than a separate concern. I’m not saying that everyone needs to become a security expert and know everything. I’m saying that people should develop secure software just like they develop software. They do it in a way that will minimize the probability of introducing vulnerabilities.

The best way to instill this mindset is through a set of standards and practices that will create habits. Going back to airport security, you don’t let all the decisions on each individual security person. “This person looks nice. Let them have the scissors, and a knife in their hand luggage”. “You sir look very dehydrated, you can take your big bottle of liquid with you in the plane!” You create a set of rules, procedures (i.e., standards) that will apply equally to everyone. And you also create a set of guidelines (i.e., practices) on how to handle specific situations: if you see something suspicious in a hand luggage, you inspect it separately.

In the next sections I’ll detail standards and practices that cover the entire SDLC. They are not meant to be self-sufficient for all sections (i.e., you might add a lot more to cover that section from a general good practices perspective). But they will make you questioning things and think about cases that are not maybe that obvious.

Where is Security focused

I’ll do a simple split of Security concerns into two main areas:

  • infrastructure security: anything related to how the application is being deployed and operating in production
  • application security: anything related to how the application is being implemented, with the specifics of the business context

There are plenty of resources on how to tackle both:

They are lengthy, comprehensive and include a lot of details and practices on how to tackle security in SDLC. It will be great if every developer will go through all these periodically in order to keep their information fresh. But in practice, this doesn’t happen quite often. I’ll try to summarize below which are the most important things to consider, agnostic of the business domain. It’s not a full list, nor a silver bullet. But it will establish a solid foundation which will minimize the possibility for security issues to happen.

Tackling Security

Infrastructure Security it’s more predictable to address, mainly due to the use of products or cloud services. They already have the security features build-it and implemented well i.e., if you use a Web Application Firewall, you trust the product to do its job, you won’t actually implement its logic. I’m not saying it’s easier, but you have more control.

Application Security it’s less predictable. You mainly rely on people skills to implement stuff securely. You need to make sure they don’t do stupid things like storing clear text passwords in source files.

Below is a list of the most important practices which I think will help you build a security mindset. It’s intended for the regular developer. When I say regular, I just mean people actually implementing, rather than all the others focused on designing, planning or managing. They are all focused on Application Security for building (REST) APIs. At a first glance they might not seem all directly related to security. But in the end, they will minimize the probability to introduce security issues.

Majority of examples will use Java.

Standards

As mentioned above, the usage of standards is the main mechanism to build a mindset. All projects should have a set fo standards. Not everyone is a fan of standards and feel they limit people’s choices and creativity. But I think it’s an easy way to get consistency, especially when having many teams working on the same platform. It allows both easier onboarding for new joiners and limit the possibility of introducing bugs or inconsistencies or argue for stupid things (spaces vs tabs ;)). It gives you more time for meaningful discussions and debates. Standards do not have to be very detailed, at least not in all areas. The majority of the standards should state principles and choices you’ve made based on existing sets of good practices.

Documentation

Key things to consider:

  • document your code interfaces and API contracts
  • define your documentation strategy:
    • what is your overall documentation strategy?
    • what do you put in the README.md file of the project?
    • do you need to update a wider documentation?
    • what tool do you use for diagrams?
    • do you use lightweight architecture decision records?
    • do you store the documentation along with the project in Git? or maybe use a separate tool?
    • if you store it within the project, what is the recommended folder structure?

General (micro)services design guidelines

Key things to consider:

  • use a blueprint/template/archetype as a starting point for all your (micro)services
  • have the blueprint already bundled with all the common libraries, plugins, etc. and aligned to the standards
  • each (micro)service must start with one command
  • (micro)services will process data only through APIs/events; there is no back-door
  • (micro)services are self-contained
  • all (micro)services are 12 factor apps and even more

Code formatting/styling

Just choose one and apply it consistently. Auto-format before commit if possible.

Naming conventions

Just choose one and apply it consistently.

API standards

Key things to consider:

  • follow REST naming practices (nouns, plurals, the usual stuff) - pick one, the internet is full of guidelines, but be consistent
  • be consistent with the naming; this applies for everything, not only the endpoints: payload object naming, properties etc. camelCase, snake_case, kebab-case/hyphen-case etc. Again, just choose something, but be consistent
  • make POST, PUT, PATCH return bodies with meaningful responses
  • use meaningful HTTP status codes, rather than 400 for everything that goes wrong
  • all endpoints must return meaningful error cases
  • use an error catalogue (more details in the Error Handling section)
  • consider something like OpenAPI and consider also doing contract-first development i.e., write the OpenAPI contract initially, socialize it with your (internal) consumers; this also enables better parallel development
  • document your OpenAPI contracts with meaningful descriptions and examples
  • all (internal) APIs must use CorrelationId/TraceId headers
  • all API inputs must be very restrictive by default
  • all APIs (internal or external) must be authenticated and ideally also with authorization in place
  • all APIs must re-use the same common data structures; either generic ones like Address, Person, Country, etc, but also define business specific ones
  • all APIs (internal or external) are exposed over HTTPS only
  • for the relevant APIs consider returning security headers within the response like: Cache-Control: no-store, Content-Security-Policy: frame-ancestors 'none', Content-Type, Strict-Transport-Security, X-Content-Type-Options: nosniff, X-Frame-Options: DENY
  • internal APIs do not communicate to each others via the internet (unless this is something deliberate or required by the architecture)
  • do not expose management endpoints over the internet; if this is something required, use authentication
  • make sure all APIs are enforcing strict validation for the received requests: do not allow undocumented JSON fields, reject malformed JSONs, etc
  • make proper use of data types; don’t have everything as a String
  • use enumerated values whenever possible
  • add length restrictions for strings and min/max for numbers
  • add patterns restricting input for each string
  • for some properties it’s easier to find patterns as they have clear definitions; a country code will always follow the [A-Z]+; for others, it’s a bit more difficult; a lastName property needs to be quite loose, considering all names in all languages; the recommendation is at least to prevent strange characters like the Unicode control chars, separators or symbol; a recommended pattern object is the following: ^[^\p{C}\p{Z}\p{So}]*[^\p{C}\p{so}]+[^\p{C}\p{Z}\p{So}]*$; this doesn’t mean that you are now protected from any type of injection; you still need to have a good understanding where the data goes and how it is processed, but at least you won’t get an emoji breaking your system

Logging standards

Key things to consider:

  • logging format: comma separated key=value pairs? json objects? choose something which is friendly to your tooling
  • always include the CorrelationId/TraceId in each log line; this will make it easier for tools to create dashboards
  • include information in logs that will make it easier to understand what’s happening: for which entity? business area? is it success? failure?
  • some good practices
  • use an abstraction over the actual logging implementation; for example in Java: slf4j with logback as implementation
  • treat logging as a cross-cutting-concern; leverage Aspects; log within methods only exceptionally; this will limit people from logging sensitive stuff
  • don’t treat logging like let's log everything and see if we needed it afterwards and dump full requests/responses; be deliberate in what you log, even when logging with debug or lower levels
  • more on Logging Data

Data standards

Key things to consider:

  • use existing ISO standards for widely known objects: Currencies, Dates, Amounts just to name a few
  • define business specific objects to be re-used
  • apply these standards for API objects, database entities and events

Processing Data

Key things toc consider:

  • sanitize data before processing it; this is a good sanitization regex ^[^\p{C}\p{Z}\p{So}]*[^\p{C}\p{so}]+[^\p{C}\p{Z}\p{So}]*$; it won’t prevent all problems, but it will strip weird chars that can cause your system to crash
  • make sure that you don’t transmit data from input towards internal elevated access operations like database queries, command line execution etc.; use parametrized queries for DB, be very specific around what you get and what you pass forward
  • favor whitelisting instead of blacklisting when you need to make decisions or when plan to restrict processing for specific input
  • overall favor defensive programming practices
  • make sure you use efficient XML parsers that are not vulnerable to XXE or similar attacks; ideally do not accept XML as input unless forced by the context

Logging Data

Key things to consider:

  • don’t log sensitive data; if you still need it for some reason, mask/obfuscate the data; what sensitive means depends on your business and regulations
  • create/use a library that masks by default the most sensitive data within your platform; for example if you’re processing payments, card numbers must be masked by default; you shouldn’t leave this decision to each individual
  • consider extending the library each time new sensitive data is added; you must also balance performance when adding too much data
  • the logging library must also allow specific configuration so that each individual service can mask additional data without extending the library
  • the logging library must provide on-demand sanitization (i.e., by calling specific methods); this will make sure the same sanitization techniques are applied for all cases
  • the logging library must sanitize data before logging it (for example by removing all the characters matching \p{Z}\p{C}\p{So})
  • the logging library must also remove CR and LF characters in order to prevent CRLF injection
  • have a clear log archiving strategy

Storing Data

Key things to consider:

  • data must not be store in case you need it; you must only store data that is relevant in current context or foreseeable future
  • storing data introduces compliance obligations; make sure you are aware of those
  • some data cannot be stored in clear (one example is credit card numbers); use hardware or software HSM for encryption
  • don’t store secrets (passwords, encryption keys, ssh keys, private keys) in version control on plain text files; use dedicated products or services for this like Vaults, HSMs
  • use salt and/or pepper when encrypting or hashing sensitive data; this will prevent brute-force attacks
  • consider building (or using) a centralized service that will tokenize sensitive data
  • you should tokenize any data that is under some sort of regulation: card data, PII data, etc.; use tokens instead of the actual data in all (micro)services and detokenize only when needed; this will minimize the compliance footprint and will also give better control around the data
  • enhance the security of the tokenization solution; do not allow external access to its APIs

Events/messaging standards

Key things to consider:

  • create an event catalogue so that everyone is aware of the purpose of each event
  • use event schemas for validation
  • avoid using generic events where you dump everything; you might leak sensitive information without wanting it
  • consider exchanging Tokens instead of the actual data for sensitive information

Configuration handling

Key things to consider:

  • avoid hardcoding configuration in source files
  • consider using centralized configuration management
  • segregate configuration by environment
  • do not store secrets (passwords, api keys, ssh keys, private keys, etc) in source files or in version control; use proper Secrets Vault systems
  • do not leave default credentials for any deployable unit (either cloud service, off the shelf products, or your own (micro)services)
  • do not put test-only code or configuration in production
  • don’t build test only backdoors inside your (micro)service
  • use version control to track configuration changes
  • have mechanisms in place for configuration integrity checking

Error handling

Key things to consider:

  • consider treating exception and errors as a cross-cutting concern; leverage Aspects, use something like ControllerAdvices or similar
  • consider embedding the logic for the most common exceptions/errors (validation issues, resource not found, malformed messages) into a shared library; this will make the interaction between (micro)services predictable and with less friction
  • use an error catalogue
  • use error codes (e.g. MICRO-4221 - bad request due to structural validation, MICRO-4222 - bad request due to business validation)
  • do not leak internal state in responses; avoid passing e.getMessage(); each error returned must be deliberately created from the root cause, but without leaking internal data
  • use a catch-all mechanism in order to avoid leaking internal state for unexpected exceptions; you can just catch Exception in the global error handler and return a 500
  • return the same object for all errors to enable a consistent experience
  • document all error cases in your API documentation with the appropriate HTTP Status code; if you use OpenAPI, document all possible HTTP status codes, even if they return the same OpenAPI object

Branching strategy and commits

Key things to consider:

  • use a simple branching strategy; trunk-based, github-flow, etc.; just pick one
  • use meaningful names for your repos and branches
  • use descriptive commits; it will make it easier to trace changes in the future
  • use small commits to better isolate changes
  • use smart commits i.e., provide a link to the task from the task management system
  • consider using pre-commit hooks to validate the commits
  • do not include sensitive information in commit messages
  • pay attention when enabling remote access to your repos; especially when repos are hosted in cloud

Code review

Key things to consider:

  • do code reviews (be kind, assertive, specific, all the good stuff)
  • let the boring stuff to the tools and focus on the functional aspects and alignment to standards and practices
  • if you find the same issue repeated over and over, add it within the standards
  • consider using checklists, at least initially until people make it a habit on focusing on the same stuff

Tooling and 3rd party libraries

Key things to consider:

  • have a process in place for introducing new tooling; do a trade-off analysis and present it in a wider group to get acceptance/agreement and make sure you address wider cases
  • when selecting open source software pay attention to the license(s)
  • create a list with licenses that can be used without asking, licenses that needs to be discussed and licenses which are not allowed to be used
  • don’t take the first (or latest) shiny tool/library/product you find; consider things like: is it stable?, is it maintained? does it have a track record?
  • consider using tools such as OWASP Dependency Check, License Plugin or even more complex tools such as Black Duck
  • create a list with the agreed tooling/libraries where people can choose from
  • update your dependencies frequently

Code Analysis

Key things to consider:

  • use one or multiple tools to analyze your code
  • you must have (at least) one tool focused on the general coding practices and (at least) one focused on security practices
  • some good tools for general code analysis (Java): Sonarqube, PMD, SpotBugs
  • some good tools for security code analysis: Veracode, Checkmarx, Sonarqube
  • you don’t need to agree with all the practices that are part of the standard rule sets of these tools (although usually they are aligned with industry recommendations); you can create a subset of rules tailored to your context

Testing

Key things to consider:

  • automate testing at all levels: unit, integration, component, API, end-to-end, etc.
  • focus on negative and boundary testing, not only on happy scenarios; CATS is a good option for API testing
  • don’t ignore failing tests, even those failing intermittently; they might hide a serious underlying issues
  • tests must be resilient and self-sufficient
  • tests must use a similar and predictable approach
  • tests must not depend on complicated external setup; they must either be self-sufficient by mocking dependencies, using in-memory setups or testcontainers or just depend on the (micro)service being deployed; any other steps will just complicate the setup and introduce complexity
  • consider adding some security testing inside the pipeline
  • consider mutation testing

CI/CD

Key things to consider:

  • include Quality Gates for the most important stuff; they must act as checkpoints and fail the build if they are not met
  • Quality Gates must be inline with these standards and automate the process of checking that each (micro)service is aligned
  • a sample CI/CD pipeline might look like this:
    • compile and build
    • check formatting
    • run tests and check coverage
    • run mutation testing
    • run code analysis
    • run secure code analysis
    • check 3rd party libraries for vulnerabilities
    • check 3rd party library licenses
    • deploy
    • run API tests
    • run other types of testing
  • this might seem too much (or lengthy), but for a microservice this is quite fast
  • script your pipeline
  • don’t couple the pipeline to the (micro)services
  • use a template pipeline for all (micro)services

Authentication and Authorisation

Key things to consider:

  • don’t roll your own authentication and authorisation; use standards products and services
  • authenticate all your APIs, internal and external; just pick something proven
  • use separate authentication and authorisation mechanism for external and internal calls i.e., use one set of credentials/mechanism to authenticate external calls and a separate one for internal calls
  • credentials are always encrypted both in-flight and at-rest
  • use HTTPS for all APIs, internal or external
  • do not accept authentication credentials via HTTP GET; use only HTTP headers or HTTP POST/PUT
  • do not log credentials not even when debug on; have your logging library also act as catch all for credentials
  • make sure your authorisation and authentication mechanism allows granular control and management i.e., you can restrict number of calls per operation, revoke access, issue additional credentials, etc.
  • consider using a centralized Identity Provider and common libraries
  • use enhanced security controls for highly sensitive APIs/services (mutual TLS for APIs, MFA for access to services)
  • use nonces to prevent replay attacks
  • always design and build with the least privilege principle in mind

General Security Practices

Key things to consider:

  • don’t ever roll your own encryption; you cannot reinvent the wheel in this space
  • use industry recommended algorithms: AES 256; RSA 2048+, SHA-2 512.
  • use TLS 1.3+ for transport security
  • use salt and/or pepper when encrypting or hashing sensitive data; this will prevent brute-force attacks
  • check your programming language practices for dealing with sensitive information; for example in Java you must use byte[] rather than String to handle password, card numbers, social security numbers, etc.; you must minimize the time the data stays in memory and clear the objcts after use

Quality attributes

As we’ve seen above, SDLC standards and practices are not always directly related to security. Same applies for quality attributes. Shortcomings in current design and approach can cause your application to go down, even if it is not caused by a true security problem.

Key things to consider for Performance:

  • use pooling for connection to expensive resources like DB, APIs, etc.
  • use thread pools
  • use caching
  • use proper collections when manipulating data
  • use parallel programming if applicable
  • make sure you understand how your ORM generates queries
  • avoid loading big resources in memory, use data streams
  • baseline your performance per (micro)service instance so that you know when to scale
  • do regular load and performance testing

Key things to consider for Resilience:

  • use circuit breakers, retries, timeouts, rate-limiting
  • have clear fallback strategies when dependent APIs are not available
  • some great resources on the topic: Resilient Systems Part 1 and Resilient System Part 2
  • make all APIs Idempotent
  • don’t store state within one (micro)service instance; use a distributed cache for that

Key things to consider for Availability and Scalability:

  • don’t make your (micro)services design limit horizontal scaling
  • plan for failure, have automated mechanisms in place for auto-scaling based on load
  • consider sharding, read-only replicas
  • use multi-region deployments

Key things to consider for Observability and Monitoring:

  • all (micro)services must expose health endpoints covering both application and the underlying container
  • the health endpoint must return information about all its dependencies: db, encryption service, APIs it connects to, event bus, etc.
  • leverage the standardized logging to create meaningful operational dashboards

Automate

Automate everything. Automation makes it predictable and consistent. The CI/CD pipeline should be the place where you automate all checks that will assess your (micro)service from a quality perspective. Tools like Semgrep can bring automation with less effort for standards not obviously suited for automation.

Conclusion

This isn’t a final list, it’s more like a brain dump. It’s a starting point for building a security mindset. Once you apply all these, you are ready to deep dive. Applying all these practices won’t give you only security benefits, but also more structure and alignment. This is particularly important in systems developing too fast, either brand new or legacy. You don’t need to go with all these from day 1, it might seem overwhelming especially if you are not used to following common standards and think it will limit your options. But maybe you can try it for a while and see what happens!

Small tips for improving the GitHub API

Disclaimer

!!! WARNING !!! If you choose to run the steps in this article, please note that you will end up with a significant number of dummy GitHub repos under your username. Over 1k in my case. There is a script at the end of the article that you can use to delete them. Be careful not to delete your real repos though!

Repos

TLDR

This is a concrete example on how you can use CATS and a personal view on how I would improve the GitHub API. Don’t expect critical bugs/issues. The findings are small things that will make the GitHub API more usable, predictable and consistent.

The Beginning

Building good APIs is hard. There are plenty of resources out there with plenty of good advice on how to achieve this. While some of the things are a must, like following the OWASP REST Security Practices, some others might be debatable, based on preference, “like using snake_case or camelCase for naming JSON objects”. (I plan to write a more detailed article in the next weeks on what I consider good practices.) As the number of APIs usually grows significantly, even when dealing with simple systems, it’s very important to have quick ways to make sure the APIs are meeting good practices consistently.

I’ve shown in a previous article how easy it is to use a tool like CATS to quickly verify OpenAPI endpoints while covering a significant number of tests cases. But that was a purely didactic showcase using the OpenAPI demo petstore app. Which was obviously not built as a production ready service. Today I’ll pick a real-life API, specifically the GitHub API, which recently published their OpenAPI specs.

I’ve downloaded the 2.22 version and saved the file locally as github.yml. Before we start, we need to create an access token in order to be able to call the APIs. Make sure it has proper scopes for repo creation (and deletion when using the script at the end of the article). Also, as the API is quite rich (the file has 59k lines), I’ve only selected the /user/repos path for this showcase. You’ll see that there are plenty of findings only using this endpoint.

You can run CATS as a blackbox testing tool and incrementally add minimal bits of context until you end up with consistent issues or a green suite.

As shown in the previous article, running CATS is quite simple:

./cats.jar --contract=github.yml --server=https://api.github.com --paths="/user/repos" --headers=headers_github.yml

With the headers_github.yml having the following content:

all:
  Authorization: token XXXXXXXXXXXXX

Let’s see what we get on a first run:

First Run

We have 42 warnings and 156 errors. Let’s go through the errors first. Looking at the result of Test 118 we see that a request failed due to the name of the repository not being unique. Indeed, CATS, for each Fuzzer, preserves an initial payload that will be used to fuzz each of the request fields. This means that we need a way to force CATS to send unique names for the name field. Noted!

Test 118

Test 426 says that If you specify visibility or affiliation, you cannot specify type.. Let’s note this down also.

Test 426

Considering the above 2 problems are reported consistently, let’s give it another go with a Reference Data File. This is the refData_github.yml file that will be used:

/user/repos:
  name: "T(org.apache.commons.lang3.RandomStringUtils).random(5,true,true)"
  type: "cats_remove_field"

CATS supports dynamic values in properties values via the Spring Expression Language. Using the above refData file, CATS will now generate a new random name everytime it will execute a request to the GitHub API. Also, using the cats_remove_field value, CATS will remove this field from all requests before sending them to the endpoint. More details on this feature here.

Running CATS again:

./cats.jar --contract=github.yml --server=https://api.github.com --paths="/user/repos" --headers=headers_github.yml --refData=refData_github.yml

We now get 17 warnings and 90 errors. Again, looking though the tests failures/warnings there are some tests which are failing due to the fact the since and before are not sent in ISO8061 timestamp format (more on this inconsistency in the Findings section).

Second Run

We’ll now update the refData file to look as follows:

/user/repos:
  before: "T(java.time.Instant).now().toString()"
  since: "T(java.time.Instant).now().minusSeconds(86400).toString()"
  name: "T(org.apache.commons.lang3.RandomStringUtils).random(5,true,false)"
  type: "cats_remove_field"

and run CATS again:

./cats.jar --contract=github.yml --server=https://api.github.com --paths="/user/repos" --headers=headers_github.yml --refData=refData_github.yml

Third Run

We now get 5 warnings and 83 errors. Looking through the errors and warnings, there is a significant amount which I consider legit issues while some are debatable points, depending on preference/standards being followed. Let’s go through the findings.

Findings

Invalid values for boolean fields implicitly converted to false

One of the Fuzzers that CATS has is the BooleanFieldsFuzzer. This Fuzzer works on the assumption that if you send an invalid value into a boolean field, the service should return a validation error. Obviously, the GitHub API does not do this, but is rather silently converting the value to false. It’s true this is a consistent behaviour, applying for all boolean fields like auto_init, allow_merge_commits, etc, but I would personally choose to return a validation error in these cases.

This is in contradiction with the OWASP recommendation around strong input validation and data type enforcing.

Invalid values for enumerated fields implicitly converted to the default value (?)

The InvalidValuesInEnumsFieldsFuzzer will send invalid values in enum fields. It expects a validation error in return. The GitHub API does not seem to reject invalid values, but rather convert them to a default value and respond successfully to the request.

This is in contradiction with the OWASP recommendation around strong input validation and data type enforcing.

Integer fields accepting decimal or large negative or positive values

The DecimalValuesInIntegerFieldsFuzzer expects an error when it sends a decimal value inside an integer field. The GitHub API seems to accept these invalid values in the team_id field without returning any error, but rather resulting in a successful processing of the request. Same applies for ExtremePositiveValueInIntegerFieldsFuzzer and ExtremeNegativeValueIntegerFieldsFuzzer which will send values such as 9223372036854775807 or -9223372036854775808 in the team_id field. Strings also seem to be accepted in the team_id field.

This is in contradiction with the OWASP recommendation around strong input validation and data type enforcing.

Accepts unsupported or dummy Content-Type headers

The GitHub API seems to successfully accept and process requests containing unsupported (according to the OpenAPI contract) Content-Type headers such as: image/gif, multipart/related, etc. or dummy ones such as application/cats.

This is in contradiction with the OWASP recommendation around validation around content types.

Accepts duplicate headers

The GitHub API does not reject requests that contain duplicate headers. The HTTP standard itself allows duplicate headers for specific cases, but allowing duplicate headers might lead to hidden bugs.

Spaces are not trimmed from values

If the request fields are prefixed or trailed with spaces, they are rejected as invalid values. For example sending a Haskell space-prefixed value in the gitignore_template will cause the service to return an error. Although this is inconsistent with the fact that if you trail or prefix the since and before fields with spaces, the values get trimmed successfully and converted to dates. As a common approach I think that services should consistently trim spaces by default (maybe with some business-driven special cases) for all request fields and perform the validation after.

Accepting new fields in the request

The GitHub API seems to allow injection of new json fields inside the requests. The NewFieldsFuzzer adds a new catsField inside a request, but GitHub API accepts it as valid. This is again in contradiction with the OWASP recommendation which suggests rejecting unexpected content.

Doesn’t make proper use of enumerated values

There are cases when it makes sense to use enums rather than free text. Some examples are the gitignore_template field or the license_template field, which are rejecting invalid values. They are obviously having a pre-defined list of values, but do not enforce this in any way in the contract. Having them listed as enums will also make it easier to understand what are all supported templates for example.

Fields are not making proper use of the format

There are 2 fields called since and before which seem to actually be a date-time, although in the OpenAPI contract they are only marked as string without any additional format information. In the description of the fields it states that this is actually an ISO date, but it will also be good to leverage the features of OpenAPI and mark them accordingly.

Date-Time Error

Mismatch of validation between front-end driven calls and direct API calls

The homepage field seems to accept any values when doing a direct API call, but when you try to set this from the UI, you will get an error saying that you need to enter a valid URL.

Validation Error from the UI

Having the same level of validation for backend and frontend is another good practice for making sure you don’t end up with inconsistent data.

Additional findings

There are some other failures which might seem debatable or not applicable:

  • description accepts very large strings (50k characters sent by CATS), although the GitHub API doesn’t actually have constraint information in the contract; again this is not necessarily a problem, but it’s important for the contract to enforce constraints
  • The RecommendedHeadersFuzzer expects a CorrelationId/TraceId to be defined in the headers, but this being a public API, it’s not actually applicable
  • The CheckSecurityHeadersFuzzer expects a Cache-Control: no-store as per the OWASP recommendations, but the endpoint does not operate critical data, so allowing caching of the information is fine

Cleaning Up

Before proceeding, please be careful to not delete your real repos. This is the script I’ve used to delete the repos. Run it incrementally and please check the repos file before deleting.

# get the latest 100 repos (by creation date)
curl -s "https://api.github.com/users/ludovicianul/repos?sort=created&direction=desc&per_page=100" | jq -r '.[].name' >> repos

# Maybe check them a bit before deleting them to make sure you don't delete real repos
for URL in `cat repos`; do echo $URL;  curl -X DELETE -H 'Authorization: token PERSONAL_TOKEN' https://api.github.com/repos/ludovicianul/$URL; done

# delete the file
rm repos

#start over until you delete everything

You need to run it several times as it deletes in batches of 100.

Final Conclusions

CATS can be very powerful and can save lots of time while testing APIs. Because it tests every single field within a request, it’s easy to spot inconsistencies or cases when the APIs are not as explicit as it should. Although I only tested a single endpoint from the GitHub API, I suspect most the findings will apply to all the other endpoints. Why not give it a try for your API?

How to write self-healing functional API tests with no coding effort

Context

APIs are everywhere, and it’s critical to have ways to efficiently test that they work correctly. There are several tools out there that are used for API testing, and the vast majority will require you to write the tests from scratch, even though in essence you test similar scenarios and apply the same testing techniques as you did in other apps/contexts. This article will show you a better way to tackle API testing that requires less effort on the “similar scenarios” part and allows you to focus on the more complex part of the activities.

Testing APIs

When thinking about testing APIs, from a functional perspective, there are several dimensions which come to mind:

  • verifying the structure of the exchanged data i.e. structural validations (length, data type, patterns, etc). This is where you send a 2 character string when the API say the minimum is 3, you put a slightly invalid email address, an slightly invalid format for a date field and so on.

  • boundary testing and negative scenarios, similar a bit to the above, but focusing on “breaking” the application. This is where you go to the extreme: you send extremely large values in Strings, extremely positive or negative values in Integers and so on.

  • behaviour according to documentation i.e. APIs are responding as expected in terms of HTTP status codes or response payloads. This is where you check that the service responds as documented: with a proper HTTP response code (2XX for a success, 400 for a bad request, etc) and a proper response body.

  • functional scenarios i.e. APIS are working as expected in terms of expected business behaviour. This is where you check that the response body contains the right business information: when you perform an action (like creating an entity, altering its state) the service correctly responds that the action was performed and with relevant information.

  • linked functional scenarios i.e. you create an entity and you get its details after to check they are the same. This is where you get a bit more end-to-end and you perform an action (like creating an entity) and with the return identifier you go and check the existence (you do a GET based on the identifier)

Ideally all the above scenarios must be automated. And it’s achievable to do this for the last 2 categories, but when having complex APIs with lots and lots of fields within the request payload, it becomes very hard to make sure you create negative and boundary testing scenarios for all fields.

What are the options for API test automation

There are several frameworks and tools on the market which can help to automate all these. Just to name a few:

They are all great tools and frameworks. You start writing test cases for the above categories. The tools/frameworks will provide different sets of facilities which might reduce specific effort during implementation, but ultimately you end up writing the actual tests to be executed (i.e. code). Even if you’ve done this before, and you know exactly what to test, even if you create a mini-framework that provides facilities for the common elements, you still need to write a considerable amount of code in order to automate all your testing scenarios.

And what happens when the API changes a bit? Some fields might be more restrictive in terms of validation, some might change the type, some might get renamed and so on. And then the API changes a bit more? This is usually not very rewarding work. Software engineers are usually creative creatures, and they like challenges and solving problems, not doing boring stuff. There are cases when one might choose to leave the API as is in order to prevent changing too many test cases.

Is there a better way?

But what if there is an alternative to this, and the first 3 categories above can be fully automated, including the actual writing of the test case. And also make the next 2 categories extremely simple to write and maintain. This is the reason I wrote CATs.

CATs has 3 main goals in mind:

  • remove the boring activities when testing APIs by automating the entire testing lifecycle: write, execute and report of test cases
  • auto-heal when APIs change
  • make writing functional scenarios as simple as possible by entirely removing the need to write code, while still leveraging the first 2 points

CATs has built-in Fuzzers which are actually pre-defined test cases with expected results that will validate if the API response as expected for a bunch of scenarios:

  • send string values in numeric fields
  • send outside the boundary values where constraints exist
  • send values not matching pre-defined regex patterns
  • remove fields from reuqests
  • add new fields inside the requests
  • and so on…

You can find a list of all the available Fuzzers here: https://github.com/Endava/cats#available-fuzzers.

There is one catch though. Your API must provide an OpenAPI contract/spec. But this shouldn’t be an issue in my opinion. It’s good practice having your API documented in a tools-friendly format. Many companies are building OpenAPI specs for their API for easier tools integration.

Using CATs

Let’s take an example to show exactly how CATs works. I’ve chosen the Pet-Store application from the OpenAPITools example section. The OpenAPI spec is available in the same repo: https://github.com/OpenAPITools/openapi-petstore/blob/master/src/main/resources/openapi.yaml.

Looking at the contract, how much time would you estimate would take to create an automation suite to properly test this API? 1 Day? 2 Days? Several days? Using CATs this will probably be a couple of hours.

Let’s start the pet-store application on the local box first (you need to have Docker installed):

docker pull openapitools/openapi-petstore
docker run -d -e OPENAPI_BASE_PATH=/v3 -p 80:8080 openapitools/openapi-petstore

This will make the app available at http://localhost for the Swagger UI, and the API will be available at http://localhost/v3.

Download the latest CATs version. Download also the openapi.yml from above.

Before running CATs, please read how it works in order to better interpret the results.

Running the built-in test cases

Suppose both cats.jar and the openapi.yml are in the same folder, you can now run:

./cats.jar --server=http://localhost/v3 --contract=openapi.yml

You will get something like this: first_run.png

Not bad! We just generated 437 test cases, out of which 78 were already successful, and we’ve potentially found 181 bugs. To view the entire report, just open test-report/index.html. You will see something like this:

test_report_first_run.png

Now let’s check if the reported errors are actually bugs or just configuration issues. We will deal with the warnings after. Checking the first failing test (just click on the table row) we can see that the reason for failing is actually due to the fact that we didn’t send any form of authentication along with the requests. Looking in the openapi.yml we can see that some endpoints require authentication while some others require an api_key.

auth_required.png

CATs supports passing any type of headers with the requests. Let’s create the following headers.yml file:

all:
   Authorization: Bearer 5cc40680-4b7d-4b81-87db-fd9e750b060b
   api_key: special-key

You can obtain the Bearer token by authenticating in the Swagger UI. The api_key has a fixed value as stated in the pet-store documentation.

Let’s do another run, including now the headers.yml file:

./cats.jar --contract=openapu.yml --server=http://localhost/v3 --headers=headers.yml

A bit better now. 93 successful and 140 potential bugs.

second_run.png

Again, let’s check if there are any configuration issues, or they are valid errors. Looking at test 1253 we can see that the target endpoint has a parameter inside the url.

test_1253.png

This is a limitation of CATs as it cannot both fuzz the URL and the payloads for POST, PUT and PATCH requests. And looking at the contract, indeed, there are several other endpoints that are in the same situation. For this particular cases CATs uses the urlParams argument in order to pass some fixed values for these parameters. The provided values must result in existing HTTP resources so that the fuzzing works as expected.

We run CATs again, considering also the urlParams:

./cats.jar r --contract=openapi.yml --server=http://localhost/v3 --headers=headers.yml --urlParams="username:username,petId:10,orderId=1"

91 passed test cases and “just” 81 potential bugs.

third_run.png

Let’s check again if these are actual errors. This time, they all seem to be valid bugs:

  • Test 429: the server returns a 500 Internal Error instead of a normal 200 empty response
  • Test 520: CATs injects a new field inside the request, but the server still responds with 200 instead of returning a 400 Bad Request
  • Test 523: CATs sends a null value in a non-required field (according to the contract) and expects a 2xx response, while the server replies with 500 evend ID cannot be null
  • and so on…

We now truely have 81 bugs which must be fixed in order to have a stable service.

Now, going into the warnings zone, these are usually tests which fail “just a bit”. The server usually responds withing the expected HTTP response code family, but either the HTTP status code is not documented inside the contract, or the response body doesn’t match the one documented in the contract. Either way, these also must be fixed as it shows that the OpenAPI spec is incomplete or that the actual implementation deviated from it.

Looking at what we’ve got at the end, with a small effort investment we were able to “write”, run and report 430 test cases which otherwise would have taken significant more effort to implement and run. We have both validation that the service works well in certain areas, it must update the OpenAPI specs or tweak the service to response as expected and got a list of bugs for behaviour which is not working properly.

Running custom test cases

But we can do even more. As mentioned previously CATs also supports running custom tests with minimal effort. This is done using the CustomFuzzer. The CustomFuzzer will run tests configured within a customFuzzer file that has a straightforward syntax. The cool think about the custom tests is that you don’t need to fill in all the details within the request, just the fields that you care about. Below is an example of running 2 simple test cases that create a Pet and verify the details of a Pet. This is saved as customFuzzer-pet.yml. (Ideally we should correlate these 2, but it seems that the pet-store API is not returning an ID when you create a Pet).

/pet:
  test1:
    description: Create a new Pet
    name: "CATs"
    expectedResponseCode: 200
/pet/{petId}:
  test2:
    description: Get the details of a given Pet
    petId: 11
    expectedResponseCode: 200
    verify:
      id: 11
      category#id: 7

We now run CATs again:

./cats.jar  --contract=openapi.yml --server=http://localhost/v3 --headers=header_pet.yml --fuzzers=CustomFuzzer --customFuzzerFile=customFuzzer-pet.yml

And we get the following result:

custom

The 2 warnings are also due to the fact that the response codes are not proeprly documented in the OpenApi contract.

And you get the same benefits as when writing the test cases using actual code:

  • you can choose the elements to verify in the responses, using the verify section
  • you can check for expected HTTP status codes using expectedResponseCode
  • and you can even pass values from one test to another using output

As mentioned above, ideally, when getting the Pet details, you should pass an ID received when initially creating the Pet to make sure that is the same Pet. The customFuzzer-pet.yml will look like this:

 /pet:
   test1:
     description: Create a new Pet
     name: "CATs"
     expectedResponseCode: 200
     output:
        pet_d: id
 /pet/{petId}:
   test2:
     description: Get the details of a given Pet
     petId: ${pet_id}
     expectedResponseCode: 200
     verify:
       name: "CATs"

What about auto-healing

Auto-healing is built in. As CATs generates all its tests from the OpenAPI contract, when the API updates i.e the OpenAPI contract updates also, CATs will generate the tests accordingly.

Conclusions

CATs is not intended to replace your existing tooling (although it might do it to a considerable extent). It’s a simple way to remove some of the boring parts of API testing by proving built-in test cases which can be run out-of-the box. It also makes sure it covers all fields from request bodies and has a good tolerance for API changes.

In the same time, it might be appealing for people that want to automate API test cases, but might not have the necessary skills to use tools required significant coding.

Why not give it a try?

Visual Cultures Part 1 - XFDs

My team (when I say team, I actually mean 14 individual teams at the moment) is using big visible dashboards since they weren’t a thing. In fact, not only dashboards, but our entire culture is very visual. We like things to be transparent and easily accessible with a simple look. This applies to everything: sprint/iteration progress, code quality, system test environments health, release readiness (I know, but not all of us are doing fully-automated-kubernetees-driven-continuous-delivery-with-10-deploys-per-day), build stability and so forth. Even if everything has a digital form that displays similar information, we prefer to also have the physical/visual stuff. It’s instant and meaningful.

One of the first things we’ve used was an eXtreme Feedback Device (XFD). There are many geeks within the team. What better way to express your geeky skills than DIY techy projects? And an XFD seemed like a very good candidate. For those unfamiliar with what an XFD is a short definition would be an instant and visual way you can monitor a Continuous Integration build health.

And so, we’ve built one :)

XFD Version 1

XFD 1.0

The first version of the XFD was built using: • an Arduino board • an RGB LED • a Java agent that monitored the CI server’s builds and was sending bytes according to the health on a serial port to the Arduino board • some Arduino code that was interpreting the bytes from the Java agent and setting the LED lights accordingly

And it worked very well. Build failed! Bang!!! XFD was turning 🔴. Build in progress! Bang! XFD was pulsing 🔵. System test environment was unhealthy! Bang! XFD was turning yellow. You get the point. We now had instant visual feedback for things very important for the team.

The Java agent was configurable so you could accommodate multiple CI builds or different team events (like daily stand-ups) based on specific time frames.

Over time, as we become more teams, the need for diversity increased. This is why we’ve created different flavors of the XFD. They were working in the same way but had different aesthetics.

More minimalistic XFD

Christmas version

XFD 2.0

We’ve used the 1.0 solution for almost 6 years. During an office move last year, we’ve noticed that everything was fine as long as you leave the XFD in one place. Once you need to move it around, it was a bit difficult for the 2018 geeks to cope with having so many wires (power-in, power-out, USB) and bulky designs.

Welcome XFD 2.0! We didn’t actually build one. We bought Xiaomi Yeelight RGB light bulbs. Quite cheap, portable and API driven :) Very geek friendly. We’ve re-written the Java agent from scratch in order to communicate with the bulb. And everything was working like before, but in a cleaner way…

… until the need for diversity stroke again. After all what each team had was just a light bulb. So, we’ve launched a competition between teams. Who creates the best looking XFD (while still being powered by a light bulb)? I must say that the results are impressive. We now have 12 really cool concepts that in the same time capture the uniqueness of each team. All pictures below.

Simpsons Jedi Smurfs Fight Club Transformers Black Sheep Wolfpack X-Men Florence Monsters Incredibles Lucky 13

XFDs are so early 00’s - Do they really work?

I strongly believe they do. But they don’t “just work”. You must create a specific mindset. Cultures do not emerge because things are there - that’s it; people need to behave in a certain way. They emerge if they are part of a bigger picture. From a mindset that considers that visual items are important. And that visual ways to show stuff are key for transparency and instant feedback.

The code for XFD 2.0

You can also start your own XFD in 2 minutes (after you buy a Xiaomi Yeelight RGB LED). The code is on GitHub: https://github.com/ludovicianul/yeextreme

Be visual!